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THE VERIFICATION AND SCORING 
OF WEATHER FORECASTS 


IrtvinG I. GRINGORTEN 
Air Force Cambridge Research Center 


In scoring a forecast, one of two purposes should be con- 
sidered: either to determine the utility of the forecast or to 
determine the skill of the forecaster. If the purpose is utility, 
then the score for each forecast should be directly propor- 
tional to the value of the forecast in meeting specified opera- 
tional requirements. If skill is the purpose, then the total 
score for a series of forecasts should be a reflection of the fore- 
caster’s ability to analyze and classify a weather situation for 
forecasting purposes; within a well-defined class of antecedent 
weather the probability of a subsequent event, like rain, is 
increased above (or decreased below) the relative frequency 
of that event in the total of all weather situations, as estab- 
lished in a long period of time. 


I. INTRODUCTION 


ucH has been written on the verification of forecasts. While some 
have minimized the importance of adhering to a system of verifi- 
cation, most authors have recognized its necessity. With no system of 
scoring, the usefulness of forecasting, generally speaking, would remain 
in doubt; the forecasters themselves would not know if, or when, they 
show improvement in their art, or whether one forecaster is doing 
better than another. Two questions are asked repeatedly: Is it better 
to perpetually forecast the most frequent event, say, “no rain,” if it 
will average better than a supposedly skillful prediction? Will the fore- 
caster be correct most frequently if he forecasts persistence, that is, 
today’s weather repeated as the forecast for tomorrow? As long as 
these questions are open, to the embarrassment of the forecaster, then 
the last report on verification and scoring has not been written. 
There have been several surveys of the literature [1, 2, 3] reviewing 
some 55 papers on verification. The earliest papers were written in 1884 
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[4, 5, 6, 7]. More recently, Glenn Brier [8, 9] has devised systems to 
verify probability forecasts. An excellent critique has been written by 
personnel of the USAF Air Weather Service [10], but not published. 
While it is not my intention to review the previous papers, it seems 
advisable to cite the key points: 

1. It is desirable to determine the utility of a forecast and to deter- 
mine at what point it ceases to have any value [7, 11]. 

2. Consideration should be given to predictions of non-occurrence 
as well as predictions of occurrence [5, 7]. 

3. Forecasts should be compared with “blind” forecasts to determine 
whether the forecasts are better than pure guesswork or better than 
those forecasts obtained by a non-skilled rule of thumb. From the 
standpoint of skill, percentage of “hits” per se is meaningless [6, 12]. 

4. The weight (or score) attached to a forecast should be affected 
by the importance of the forecast, the difficulty of making an accurate 
forecast, and the proximity of the forecast to the verification [7, 10, 
11, 12, 18, 14, 15}. 

5. For a suitable system of verification and scoring, forecasts should 
be clearly and unambiguously stated so that they can be checked 
quickly and accurately against the subsequent events [general agree- 
ment among the authors]. 

6. A forecast will have greater value if the forecaster becomes ac- 
quainted with the specific operational requirements [16]. 

7. Skill in forecasting has been defined as the number of hits in 
excess of those obtainable by chance, or by forecasting persistence, or 
by perpetually forecasting the event that has occurred most frequently 
in the past [12]. 

The above principles are acceptable for this paper except for two 
key points. First, a clear distinction should have been made between the 
utility of a forecast and the skill of the forecaster. Failure to emphasize 
this distinction has undoubtedly caused much of the difficulty of verifi- 
cation and scoring. Secondly, skill should be redefined, as is done in the 
second paragraph below. 

The utility of a forecast should be judged by the operational re- 
quirement; in fact, the forecast becomes the working assumption. It 
might well be that the forecaster cannot contribute more than the 
climatic data and frequencies for given purposes of operation; or the 
persistence forecast (tomorrow’s weather will be the same as today’s) 
might be the best working assumption for certain needs. It is impor- 
tant, therefore, to devise a scoring system so that the score allotted to 
a forecaster would be an index of the value of his forecast for given 
operational needs. The prediction of fog at a given airport might carry 
with it five times as much weight, taken operation-wise, as the predic- 
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tion of clear skies. Moreover, there might be some weight attached to 


_a “nearly correct” forecast in proportion to its operational value; the 


prediction of low clouds at the given airport, where fog ultimately 
develops, might carry with it a definite score because it would have 
alerted the operations office to the possibility of fog. Lastly, a forecast 
might become important in the light of the present state of the weather 
[12]; if the airport is closed by fog now, it becomes important to fore- 
cast its time of opening. 

In defining skill, this paper departs from previous reports: Skill is 
the forecaster’s ability to analyze and classify the antecedent weather so 
that, within one class, the probability of a subsequent event is increased 
above, or decreased below, the relative frequency of that event in all weather 
situations (hereafter referred to as the “climatic frequency”). For 
example, if the climatic frequency of rain, obtained from 10 years of 
record, is 5% and the forecaster recognizes in the present situation a 
20% probability for rain, then he is exhibiting his knowledge and 
skill. On the other hand, if we accept as skill “the number of hits in 
excess of those obtainable through chance forecasts,” we would be 
forced to conclude that, with respect to a rare event of 5% frequency, 
the forecaster would exhibit no skill at all if he could not obtain more 
than 95% of his forecasts correct. 

A single forecast cannot be made to meet the operational needs 
and a test of skill at one and the same time, except by pure coincidence. 
The forecaster should be asked to make two separate forecasts, one for 
use, and the other for a test of his skill; and the two forecasts may be 
in disagreement. The operational requirements themselves might call 
for severa! forecast statements or working assumptions, aside from 
any test of skill [3]. The methods of verification and scoring, described 
below, form two groups, those that test the operational value of the 
forecasts, and those that test the skill of the forecaster. 


II. METHODS 


To proceed further it is convenient to introduce symbols for the 
events which are to be forecast. Let Xo, Xi, -- +, Xn, denote (n+1) 
mutually exclusive events. For example, X») might represent ceiling 
(i.e. cloud height) zero, X, might represent ceiling 100 to 400 feet, X2 
might represent ceiling 500 to 900 feet, and so on. The symbol X; 
denotes any class of event of the kind X. 

The required form of the forecast might be one of the following: (1) 
For his forecast, the forecaster must select a single event from the 
(n+1) mutually exclusive events, but he obtains partial credit for his 
choice depending upon the nearness of his choice to the observed event; 
(2) The forecaster can select more than one event as alternate possibil- 
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ities; but his credit, upon verification, is greater if he has selected 
fewer alternates; (3) The forecaster must group the (n-+-1) events 
dichotomously and select one group, as his forecast. Each of these 
forms of the forecast is described below: 


la. Requirement: a single choice from (n+1) mutually exclusive events; 
test of operational value. 


The forecaster usually faces a situation in which he sees the chances 
for several alternate developments. For example, clouds might develop 
low enough to close the airport, or low enough to restrict the number of 
flights in and out of the airport, or high enough to permit normal sched- 
ules. One of these possible developments must be chosen as the working 
assumption of the operations office. While the best assumption is the 
event that verifies, yet there is generally value in an assumption that 
proves nearly correct. 

Let az; be the score obtained by the forecaster if his prediction was 
X, but the subsequence is X;; then a;; is the perfect score; a table 
of scores would resemble Table 1. For operational purposes the score 
should be made directly proportional to the net profits resulting 


TABLE 1. Scores for each combination of forecast and observed 
event; antecedent condition: Xm’ (schematic) 








Observed 





Forecast 





Xo 
Xi 
X: 
X; 








from the assumption (or forecast). Sometimes such a score would be 
negative, representing losses incurred by the operator. The score for 
an incorrect forecast must not be greater than the score for the correct 
forecast, so that ax; S.a;;. It is the job of the operations office to choose 
reasonable values for the (n+1)* scores ax;. How this might be done 
is illustrated by the following: 

Let us suppose that the rate at which airplaines can be accepted 
into an airport,! like LaGuardia Airport, is governed by the flying 
weather classification at the airport. That is, for each class X;, of the 
flying weather (determined by cloud base, cloud top and visibility) 





1 The term “acceptance rate” has been used in this connection by members of the Air Navigation 
Development Board, a technical advisory group under the Secretaries of Defense and Commerce. 
The writer is indebted to this group for a description of this problem. 
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there is a fraction f;, of the maximum rate at which airplanes could enter 
the airport. Then the forecaster of an airline, essentially, will be fore- 
casting fz, and the operations office simply will want f, as a working 
assumption. Let 7’ be the net profit resulting to an airline from each 
successful flight. Let L be the loss resulting from an incorrect decision 
or an unsuccessful flight ; the loss would be due to overhead, the salaries 
of the crew, cost of gasoline and loss of good-will expressed in dollars 
and cents. Let l be the net cost resulting from cancellation; this should 
be the overhead such as the cost of rentals, maintenance of idle equip- 
ment and salaries. Let a,;, the score, be identical with the mean net 
profit resulting from the operation of the fraction f;, of the normal num- 
ber of flights and the cancellation of the remaining flights. Then 


(1) os — (1 —-—fi)l for fx S fi 
ani = 

. T -—(fe-fOL —-(— fil for fe > fi 

If 7 =3, L=2,1=1 (purely hypothetical), and if there are three accept- 


ance rates: fo=0, fi=1/2, fe=1, then from equations (1) the scores 
would be those in Table 2. 


TaBLeE 2. Scores that are directly proportional to net profits 
or losses, if 7 =3, L=2, 1=1, (see text) 








Forecast, or Verified 
Working, 
Assumption fo fi fe 








fo —1 -1 —1 
fi —3/2 1 1 
fs —2 1/2 3 








Is the above example a realistic one? From the point of view of an 
airline operator it is undoubtedly oversimplified. But, if a forecaster is 
asked to predict a single event, then, essentially, he is being asked to 
make the operational decision, because he cannot predict the weather 
precisely. If it is admitted that the forecast is, in fact, the working 
assumption, then it must also be admitted that the forecast must be 
governed by the economics of the operation. Either the decisions must 
remain forever subjective, or a table of scores, such as Table 2 should 
be computed to guide the forecaster or operator in his decisions. 

In s trials the forecaster’s rating, F, is the ratio of his accumulated 
score to the accumulated score of perfect forecasting: 


P= Do os 


> Oss 
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The forecasting system that maximizes this rating is the preferred 
system for the given operational requirements. This rating has the 
maximum value of unity, but may assume negative values since the 
scores reflect the profits, and oftentimes there are losses instead of 
profits. If the persistence forecast yields the greatest rating, F, then it 
is proper and fitting to forecast by persistence for the given operations 
and neglect the skill of the forecaster. 

If this author is permitted to digress from the main theme of this 
paper, it is desirable to show how the forecaster might decide which 
working assumption is best for the operations office. Faced with his 
own classification of antecedent weather, the forecaster might realize 
that, for today’s class of weather, each acceptance rate has a certain 
probability of verification. Within the forecaster’s designated class, 
let po, Pi, * * * 5 Dm) * * * , Pn be the true probability of each acceptance 
rate fo, fi, --+, Jn. (In this discussion the p,,’s are not subjective 
probabilities, although they may be so difficult to determine that, in 
practice, the forecaster will have to resort to approximations. In 
theory, each p, must be assumed to exist as the limit of the relative 
frequency of the subsequence X,, within today’s class of antecedent 
weather.) Then 


2. Pm = 1, 


Within the class of weather as analyzed by the forecaster, the expected 
average profit netted to the management under the working assumption 
f, will be proportional to A; in the equation 


(2) Poco + Pron + +> + + Prone ++** + Patten = Ap. 


Equation (2) represents (n+1) equations, one for each working as- 
sumption. The best selection by the forecaster or operations office is 
that working assumption which maximizes Aj,. 


1b. Requirement: A single choice from (n+1) mutually exclusive events; 
test of skill. 


To test the forecaster’s skill, the method of scoring shouid force 
him to select, as his forecast, an event which, in view of the antecedent 
weather, has a probability of occurrence greater than the climatic 
frequency of that event. (It could be argued that this is not a forecast 
in the usual sense of the word, but, undeniably, it is a forecast in the 
general sense [17].) The following paragraphs are devoted to finding 
values for the a,,’s (Table 1). 

Let P.o, Pe, * * * » Den be the climatic frequencies of each future event 
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X,, X:,-+ +, Xx for the given period of the year and for the given 
time of day, as established over a long period of time. Then 


2D. Dem = |], 
m=0 


If a forecaster blindly selects X; as his forecast throughout an arbitrary 
set of N days, then, for the fraction p.. of the N days, his expected 
average score would be po axo; for the fraction p,; his expected average 
score would be p,;a%, and so on; or, for all N days, his expected average 
score would be 


Me Pcm&km- 

m=0 
In accordance with this paper’s definition of skill, it is necessary to 
make it no more profitable for the “blind” forecaster to predict any one 
event, throughout an arbitrary set of N days, than to predict any 
other event, that is, to make his expected average score equal to a low 
constant, say 1.0. Thus, 


(3) >> Pemtim = 1 for any X,(0 Si Sn). 
m=0 


There are several relations that are self-evident. First, the score for 
a partially correct forecast should be always less than the score for the 
totally correct forecast. Or, 
(4) Oni < Ay. 
Secondly, the score for a completely erroneous forecast should be zero. 
Or, 
(5) Qon = Ano = 

Let us suppose that a forecaster had predicted consistently one 
future event for a class of N days, but the true probabilities of Xo, 
Xi,- +--+, X» within that class of N days are po, pi, - - * , Pn. To begin 


with, we may suppose that the true probability of X; increased at the 
expense of the others. That is, 


Pk > Dek and Pm s Dem for m # k. 


But since 


Lrmw= Lm =I 


(6) Pi — Dek = >. (Pom — Pm) = (Pom — Pm) form xk. 


mytk 
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By the definition of skill, and accepting the value of 1.0 for no skill, 
it is necessary that the forecasting of X; on each of the N days, when N 
is large, should net the forecaster a true average score greater than 1,0, 
and the forecasting of anything else should net him a true average 
score of 1.0 or less. 

That is, 


(7) 5 naan > I 


m=0 


and 


(8) Lo Pmatim <1 


m=0 


os 


From (7) and (3) 


4 


(9) (Pe — Pek)oen > b> (Dem — Dm) Cem for m #k. 
m= 0 

In view of relation (6), relation (9) is valid for the necessary and suffi- 

cient condition 

(10) kk > km. 


(Note the difference between relation (10) and relation (4).) Also, 
from (8) and (3) 


(11) (pr — Perla S py (Dem — Pm)Qim fori # k, m #k, 
m=0 


Therefore, in view of relation (6) it is necessary that 
(12) Aik SL aim form # k, i # k. 


Similarly, by interchanging the roles of X,, and X;, it can be shown 
that 
Qik Z Aim for k # m,i #m 


whence 
(13) an = aim = B; for k # mi ¥ mi ¥k. 


The treatment, so far, has still allowed some variability in the 
scores (a;;). Next, let us suppose that the class of N days is such that 
the true likelihoods of both X; and Xp» increased in proportion to their 
climatic frequencies. That is, 

Pk Po 


(14) =—>1., 
Dek Pco 
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By the definition of skill, it is necessary, for N large, that the forecast- 
ing of X; should net the forecaster the same true average score as the 
forecasting of Xo, both scores greater than 1.0. That is, using relations 


(13) and (5), 

(15) Pocoo = Prorke + (1 — px)Bx. 
From (13), (5) and (3) 

(16) Peomoo = 1. 
From (13) and (3) 


1 — Bx(1 — per) 
Dek 
From (14), (15), (16), (17) 6.=0; or 
(18) Cim = 0 for m # k. 





(17) Akk = 


Returning to the (n+1) relations (3), it follows that 
1 
Dei 


Hence, when the forecast requirement is to choose a single event from (n+1) 
mutually exclusive events, the skill score for a perfect forecast is inversely 
proportional to the climatic frequency of the observed event; the score for 
an incorrect forecast, however close to the observed event, is zero. 

Professor Joseph G. Bryan, Massachusetts Institute of Technology, 
who collaborated in the preparation of [3], explained, in a personal 
interview, that the above result confirms the convictions that they 
had felt after their investigations of scores. Whenever they attempted 
to give partial credit for a nearly correct forecast, they compromised 
the system to the point where it was possible to “play” the system, 
usually by forecasting the most frequent event or by forecasting 
persistence. 

It can be proved that the true average score for a forecast of any 
event X;, whose p.>p-x, Will be greater than unity. However, from 
(19) it follows that the true average score will be greatest for the event 
for which (pi-p-x)/per is greatest. That is, the forecaster should fore- 
cast the event whose likelihood is increased most in proportion to its 
climatic frequency. 

Ordinarily it is easier to make a prediction of the weather following 
one basic condition than it is to make the same prediction following 
another basic condition. For this reason persistence forecasts are 
oftentimes good forecasts; the forecaster is more likely to be correct 


(19) ay= 
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about clear skies six hours later if he observes clear skies at the time 
of his forecast. A table of scores, therefore, should be computed for each 
basic class of the weather (e.g. Table 5). If X; denotes a class of cloud 
heights, then X,,’ denotes the class of the cloud height at deadline time 
when the forecast is due. The climatic frequencies pco, Pea, °° >, Den 
should be determined following each class of basic weather. If this is 
done, then, through relation (3), the true average score for “blind” 
forecasting will become unity (1.0), regardless of whether the “blind” 
forecast is a perpetual forecast of one event, or a forecast of persistence, 
or a forecast by pure chance. 

In s trials the forecaster’s rating F is the ratio of his accumulated 
score to the accumulated score for perfect forecasting: 


e 
S Dd ans 
a ° 
DY ai 


If it is felt that the rating for skill, R, should be zero for blind forecast- 
ing and unity for perfect forecasting, then the formula is 


(20) F 


a 
= > an — 8 


e 
> ne ©€ 
Here, then, is a system of verification and scoring which requires 
little effort in procedure. However, it is too stark for a verification 
program involving competition among forecasters. If applied, it would 
yield average scores that would be too erratic, unless the competition 
were extended over a very long period of time. It seems better, for 
practical purposes, to compromise with rigor, to accept relations (3, 4, 
5, 7), and (10) but to reject relation (8) in the sense that it will not be 
met rigorously. If this is done, then one can still be assured, by relation 
(7), that a skillful forecaster will have a true average score greater than 
unity, and, by equation (3), that the “blind” forecaster will have a 
true average score of unity. It is clear, therefore, that the forecaster 
will obtain a true average score greater than unity only if he has 
knowledge and experience with the weather sufficient to enable him 
to reckon with the increased or decreased likelihood of each future 
event. 
The left-hand side of relation (8) may be greater than unity, but the 
scoring system will not suffer if 


(21) R 


(22) z= DmQXkm > > PmQim for t ca k. 
m=0 m=0 
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Upon assigning adjusted scores, such as those in Table 6, this writer 
has not been able to “play” the system, that is, to extract a high average 
score out of Table 6 by forecasting a less-likely-than-usual event rather 
than an event of increased likelihood. 

As an example, let us choose the flying weather classification at. 
Randolph Field, Texas (Table 3). Let us consider only the winter 
months of December, January and February, the forecast to be made 


TaBLeE 3. Flying Weather Classification. H denotes ceiling 
height (ft), V denotes visibility (miles) 








Symbol Item Description 





Xi Airport Closed V <1 or OSH <500 or both 
X2 Instrument Flight Rules 1sV <3 and H2=500 
V 23 and 500 <H <1,000 
Contact with Low Clouds V 23 and 1,000 SH $2,000 


Contact; low clouds absent V 23 and H >2,000 





at deadline time 1230 CST, for the minimum conditions in the subse- 
quent morning hours 0430 to 1030 CST. From 10 years of climato- 
logical data, the frequencies have been established as shown in Table 4. 
Solutions of equations (3, 18, and 19) yield the scores shown in Table 
5 in which some of the scores are large while most of the scores are 
zero. Table 6, on the other hand, was prepared subject to the relations 


TaBLzE 4, Frequency of the lowest flying weather classification between 
0430 and 1030 CST in Dec., Jan., Feb., at Randolph Field, Texas 








Frequency of classification between 
Classification 0430 and 1030 CST 
at 1230 CST 





Xs Xs 





Xi’ 6% 
X;’ 18% 

X;’ 20% 
Xx,’ 
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TaBLE 5. Ideal skill scores for each combination of forecast and verification of 
the flying weather classification at Randolph Field, Texas, Dec., Jan., or Feb, 
Forecast time is 1230 CST; verification period is between 0430 and 1030 CST 








Antecedent Class: X,’ Antecedent Class: X2’ 








Verification Verification 
Fore- Fore- 








Xi X2 X3 





2 











Antecedent Class: X;’ Antecedent Class: X,’ 








Verification Verification 








Xi X: X3 





0 











TaBLE 6. Adjusted skill scores for each combination of forecast and verification 
of the flying weather classification at Randolph Field, Texas, Dec., Jan., or Feb. 
Forecast time is 1230 CST; verification period is between 0430 and 1030 CST 








Antecedent class: X,’ Antecedent Class: X2’ 








Verification Verification 
Fore- | Fore- 


X3 Xi X2 X; 




















Antecedent Class: X;’ Antecedent: Class: X,’ 








Verification Verification 
Fore- Fore- 


X2 Xs; X3 
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(3, 4, 5) and (10); but the conditions (18) and (19) were replaced by 
the arbitrary relations 


Oss 
(23) an = rar , 

In the Air Force Geophysical Research Directorate, a trial project 
was chosen, the forecast of minimum flying weather classification at 
Randolph Field, Texas between the hours of 0430 CST and 1030 CST 
inclusive (Table 3). There were two daily forecasts which were due at 
deadlines 1230 CST of the day before and 0330 CST of the same day. 
For the first deadline, the table of scores was Table 6; for the second 
deadline a similar table was constructed. 


TABLE 7a. Ratings for skill (R); daily forecasts between 1 Dec. 1949 and 28 Feb- 
1950 of lowest flying weather classification at Randolph Field, 
Texas in the morning hours 0430-1030 CST 











Deadline Deadline 
Forecaster 1230 CST 0330 CST 
(67 days) (70 days) 





A 
B 
C 





Note: No-skill score is zero; perfect score is 1.00. Standard deviation of each 
score is approximately +0.07. 


TABLE 7b. Forecaster B’s ratings for skill and percentage of hits in the fore- 
casting of lowest flying weather classification at Randolph 
Field, Texas during the 90 days of the winter months 








Deadline Deadline 
1230 CST 0330 CST 





1948-49 Rating for skill 0.16 0.11 
Percentage of hits 47% 70% 





1949-50 Rating for skill 0.17 0.18 
Percentage of hits 48% 62% 





Three forecasters, A, B, C, made forecasts on 67 to 70 days between 
1 Dec. 1949 and 28 Feb. 1950 (Table 7a). Forecaster B, who had made 
his forecasts for all 90 days in the 1949-50 winter season, had made 
similar forecasts for the 90 days of the previous winter season (Table 


7b). 
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The point raised by Tables 7a and 7b is that, aside from random 
variations, the three forecasters did not improve their skill in the 0330 
CST forecast as compared with their skill in the 1280 CST forecast. 
Forecaster B, who had been making his forecasts for operational use, 
had a relatively high percentage of “hits” (Table 7b); moreover, his 
percentage of hits increased for the shorter-range forecasts. But, judged 
from the point of view of skill, his forecasts did not display more skill 
for the shorter-range forecasts than for the longer-range forecasts. 

The above method of verification and scoring should prove satis- 
factory for the forecasting of certain weather elements such as tempera- 
ture, pressure and amount of rainfall, which vary continuously from 
low values to high values. However, the method will prove unsatisfac- 
tory for the verification of an element such as “weather phenomenon.” 
If he forecasts rain, the forecaster should not necessarily obtain credit 
when fog develops, wherefore the following requirement is now con- 
sidered: 


2a. Requirement: One or more choices from (n+1) mutually exclusive 
events; test of operational value. 


Let a;,...; be the score for a correct forecast of the combination of 
events X,, Xks1, - + + , X; treated as a single event. It is the job of the 
operations office to choose reasonable values for the a,...;’s. If the 
weather is classed as “no weather,” “fog” or “precipitation,” the opera- 
tions office might consider the forecast of fog as useless when rain 
verifies, and vice versa. A forecast of fog and precipitation as alternate 
possibilities might carry some weight, although less than the forecast 
of either event separately. 


2b. Requirement: One or more choices from (n+1) mutually exclusive 
events; tests of skill. 


This requirement lends itself to a satisfactory system of scoring for 
skill. If peo, Pa, -**, Den are the climatic frequencies of each future 
event, for the given period of the year, for the given time of day, 
and for the given basic antecedent condition, X,,’, then, as in section 
lb above, to make it no more profitable for the “blind” forecaster to 
predict one combination than another, 


(24) Oe..-5 >, Pom = 1, 


k 


Or, the score a,...; is inversely proportional to the climatic frequencies 
of the events Xz, X41, +: , X; treated as a single event. 
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As an example, let us consider the weather phenomenon at Mitchell 
Field, L.I., classifying the weather as in Table 8. Let the forecasting 
be done in the month of July, deadline at 1430 EST, the forecast for 
0230 EST of the following day. From 12 years of data, the frequency 
of the events at 0230 EST of the following day were found as in Table 
9, The derived scores are shown in Table 10. 


TABLE 8. Classification of Weather Phenomenon 








Description of Class 





No weather 

Obstruction to vision other than fog 
Fog 

Precipitation 








TABLE 9. Frequency of each class of weather phenomenon 
at 0230 EST at Mitchell Field, L.I., July 








Class of weather Classification 
phen. at 1430 CST 
of preceding day Xo Xi Xs Xs 








X)’ 59.0% 15.1% 17.2% 8.7% 
Xi’ or X;’ 27.8% 34.8% 29.6% 7.8% 
X;’ 17.6% 17.6% 58.9% 5.9% 














TaBLE 10. Score for each combination of forecast at 1430 EST and verification 
of weather phenomenon at 0230 EST of following day at Mitchell Field, L.I., July 








Antecedent Classification 





Obst* to vision, Precipitation 
or fog or thunderstorm 
Xi’ or XxX,’ ; X;’ 


No weather 
Xo’ 





Ne WOW eS oD 
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ee 
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The forecaster’s rating, F, is the ratio of the accumulated score of 
the forecaster to the accumulated score for perfect forecasting: 


t 
sail > Ak. «+f 
s 
iF > a; 


where ¢ is the number of “hits” in s trials. If it is felt that the rating 


should be zero for “blind” forecasting and unity for perfect forecasting, 
then in s trials, the rating for skill is 


t 
- Dd ax...3 — 8 
2 
dYa—s 


3a. Requirement: Dichotomous grouping of the (n+1) events; test of 
operational value. 


F 


R 





A requirement of this nature might become important at an airport 
where several types of aircraft are used, or where the pilots have 
varying flying qualifications. The forecaster must state whether the 
ceiling will be above the minimum for one airplane and pilot, above 
the minimum for another, and so on. We might suppose that a, is the 
score for a correct forecast of ceiling equal to or below Xx, and f; 
the score for a correct forecast above X;,. It is the job of the operations 
office to choose reasonable values for the a,’s and £;’s. 


3b. Requirement: Dichotomous grouping of the (n+1) events; test of 
skill. 


Again, let us use the forecast of ceiling as an example. It is necessary 
that the forecaster make n statements: first, as to whether the ceiling 
will be below or above Xo, secondly, whether below or above Xi, 
and so on. If poo, pa, ---, Den are the climatic frequencies of Xo, 
Xi,--+, Xn respectively, then it becomes equally profitable to the 


“blind” forecaster to predict ceilings above X; or equal to or below 
X;, if 


k k 
ar >, Pem = as(1 Sa > po), 
m=0 


m=0 


Qk + Bx - 10, 





‘ 
{ 


{ 
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then 


k k 
a, = 10(1 — X pm) and 6B; = 10> pem. 


m=0 m=0 


Such a system has the advantage that scores will range between zero 
and 10; but the forecaster must make n statements on the weather. 


III. REMARKS 


This paper, which grew out of a re-examination of previous programs 
of verification, was written in an effort to correct the previous weak- 
nesses before another large-scale test of the abilities of forecasters is 
tried. One primary conclusion is that, to test their skill, forecasters 
should make special forecasts, independent of their operational duties, 
which forecasts might be inconsistent with the operational forecasts. 
There is, however, one feature common to the two forecasts: the fore- 
caster’s ability to analyze the antecedent weather into classes within 
which the probabilities of each subsequent event will differ substan- 
tially from its climatic frequency. It is this feature which should make 
a test of skill a good criterion of the usefulness of the forecaster’s 
techniques in most operational forecasts. 

Space will not permit a careful appraisal of the applicability of the 
above methods of scoring. I will simply say that tabulating machine 
methods could make a program of verification and scoring feasible 
with relatively few man-hours of labor. It would be possible to deter- 
mine the relative merits of two or more forecasters, to determine how 
much easier it is to meet one requirement at one hour than it is to 
meet the same requirement at another hour, to determine whether it 
is more difficult to make forecasts in one season than in another, 
whether one forecaster has an advantage over another by being close 
to the station for which he is forecasting, and to determine whether 
it is more difficult to make a forecast under one prevailing condition 
than it is under another, such as continuous rain. 

As a last remark, the principles and methods of this paper could 
easily apply to periodical forecasts of any kind. That is why this article 
appears in a statistical instead of a meteorological journal. 
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STATISTICS IN PRODUCTION AND INSPECTION* 


Epwin G. Oups 
Carnegie Institute of Technology 


INTRODUCTION 


URING the past ten years there has been a phenomenal growth in 
the use of Shewhart control charts in improving the quality of 
industrial products. Likewise, manufacturers have greatly increased 
their use of standard sampling plans in the inspection of purchased 
materials. In view of the wide areas of application still untouched, 
these trends seem likely to continue. 
An appraisal of the future for statistics in production and inspection 
focuses attention on three worthy questions: 
1) What contribution can the statistician make to greater and more 
effective utilization of present statistical methods in industry? 
2) What part can the statistician play in developing new statistical 
methods and in making them readily available? 
3) How can the quality control engineer get the maximum benefit 
from the work of the statistician? 
To suggest tentative answers to these three interlocking questions is 
the object of the present paper. 


THE CONTROL CHART 


In attempting to answer the first question the statistician will be in- 
terested in discovering what statistical methods are now in use. It 
seems fair to assume that the majority of quality control engineers place 
much dependence on the Shewhart control chart technique and on 
standard sampling tables. The formal statistical education of many 
quality control engineers began with a basic course in Statistical Qual- 
ity Control of the sort offered at various universities during the last war 
with the help of the Office of Production Research and Development, 
and the United States Office of Education. This has been supplemented 
by individual reading and study, and, in some cases, by advanced work 
in theory and practice. 

The application of the control chart at the beginning, at least, follows 
the pattern prescribed by the American War Standard, Control Chart 
Method of Controlling Quality During Production [1]. In brief, it consists 
of three stages. In the first stage, sample data on a quality character- 
istic are collected at time intervals from the manufacturing process. 





* Presented at a joint meeting of the American Statistical Association, American Society for 
Quality Control (Chicago Section), and the Institute of Mathematical Statistics, Thicago, December 
28, 1950. 
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In the second stage the data are charted, analyzed for homogeneity, 
and used, if possible, to predict the nature of future product. If the 
process gives sufficient evidence of stability and of its ability to produce 
quality meeting specifications, standards are set for the limits of vari- 
ability of means and ranges of samples from future production. In the 
third stage samples are collected at time intervals and if the variability 
of either of these statistics is outside the limits set, the cause is investi- 
gated. 

Examining the three stages in a little more detail for the control 
charts for measurements, suppose that in the first stage, samples of five 
units are measured. Then, in the second stage the average, X, is cla- 
culated for each sample and plotted in order on one chart, and the 
range, R, for each is plotted in order on another chart. Usually about 
twenty-five groups of five are used in this stage. The grand average, X, 
and the average range, R are calculated. On the X-chart a “center 
line” is drawn corresponding to the value of X and control limits cor- 
responding to X + AoR. (For samples of five, A: =.577.) On the R chart 
the center line is drawn corresponding to R and an upper control limit 
corresponding to D,R. (For samples of 5, Ds=2.114.) The appearance 
of any point outside the control band on either chart is interpreted as 
meaning that an assignable cause of variability existed at the time the 
sample was taken. If all points on both charts are inside the control 
bands, it is assumed that the process is in control, and then the natural 
tolerance limits for the product are estimated as X +3R/d2. (For this 
case, d:= 2.326.) If these limits do not fall outside specification limits 
the process is considered to be satisfactory. 

Probably it will be of no help to the quality control engineer to tell 
him that he ought to calculate and use the standard deviations for the 
individual samples, rather than the ranges. For a single sample of five 
units from a normal universe it is well-known, [2, p. 391], that the rela- 
tive precision (ratio of variances) of unbiased estimates of the universe 
standard deviation based on sample standard deviation and of the 
universe standard deviation based on sample range is 0.962 but it is 
very doubtful whether the additional cost of computing the more pre- 
cise estimate can be justified in most quality control applications. 

The conjecture that there is no great loss in the use of R/d: as an 
estimate of o’ (the universe standard deviation) is cortirmed by study 
of recent papers by Grubbs and Weaver [3], Lord [4, 5], Patnaik [6], 
and Pearson [7]. Grubbs and Weaver make the interesting discovery 
that if n data are available for estimating o’ and average range is to be 
the basis, then, for n !arge the data should be divided into groups of 8. 
For n= 96, for example, they would use 12 groups. The standard devia- 
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tion of such an estimate is 0.0831 as compared with 0.0726 for the stand- 
ard deviation of the estimate based on the standard deviation for the 


96 values. 


A DIGRESSION 


Before continuing this over-simplified and meager discussion of the 
control chart technique it seems useful to digress long enough to note 
that the statistician can be very helpful in bolstering the courage and 
knowledge of the quality control engineer by giving him the pertinent 
results of such papers as those cited above. Judging from the warm re- 
ception accorded a paper on “Practical Applications of New Theory” 
[8] given by Mosteller and Tukey at the Third Annual ASQC Conven- 
tion at Boston in 1949, quality control engineers are grateful for such 
service. 

In the introduction to their printed paper, the above authors note, 

“Many statistical developments are not used in practice until long 
after their discovery. Sometimes this happens because the material is 
written in difficult mathematical language which most users do not 
have the time or equipment to translate. In other cases the develop- 
ments are published in journals not ordinarily read by the potential 
users. For example, a general development in statistics may be pub- 
lished in a journal for geneticists in an article about specialized flora. 
Quality control people are not likely to be aware of such a result. 
Quality control people have been quick to apply the latest methods 
which have been brought to their attention, and they frequently ask 
for and need new and different tools to help solve their more specialized 
problems.” 

The facts and implications of the above quotation go a long way in 
the direction of giving aaswers to the questions proposed at the begin- 
ning of the present paper. 


THE CONTROL CHART (CONTINUED) 


It is quite obvious to the statistician that the data collected in the 
first stage of the control chart analysis could be treated by the analysis 
of variance. Craig [9] and Scheffé [10] have compared the two types of 
analysis in recent papers, the former solving a specific problem by both 
methods and discussing the relative advantages of each. For the simple 
case under consideration here, time is the single criterion used as a basis 
for classification and, using standard procedures, the variance estimated 
from variation “between times” could be compared to the variance 
estimated from variation “within times.” However, the 25 “within 
times” estimates should not be combined to provide a single estimate 
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until they are tested for homogeneity. Customarily the Bartlett test 
[16 pp. 387-388] is used for this purpose. The job accomplished by the 
control charts for average and range does not seem to be quite the same 
as that done by analysis of variance. Furthermore it is debatable 
whether the additional computation required by the latter can be 
justified. This is a question which the statistician might investigate 
further. 

Nevertheless, the quality control engineer wili find much use for the 
principles of analysis of variance and familiar ground is a natural point 
of departure for the learning process. For example, it is instructive to 
apply the analysis of variance to part of Wyatt Lewis’ well-known data 
on the rheostat knob. Most of the “alumni” of the earlier intensive 
courses in quality control by statistical methods are familiar with this 
interesting application of the control chart technique since the data for 
several periods of processing were given in the manual [11] used in these 
courses. 

Grant [12, pp. 27-31] has the control chart analysis for the first 
twenty-seven samples of five units, from which Lewis made his prelim- 
inary appraisal of the process. Based on the control chart analysis, the 
process is judged to be in statistical control. Since, in units of one- 
thousandth of an inch, X¥ =140.6 and R=8.6, the natural tolerance 
limits for the process can be estimated as 129.5 and 151.7. However, 
an analysis of variance, using time as the single criterion, provides a 
“between-times” estimate of variance of 6.5 based on 26 degrees of 
freedom and a residual variance estimate of 13.65, based on 108 degrees 
of freedom. Compared to the latter, the former is surprisingly small. 

The data are arranged in rows and columns, with each row repre- 
senting a sub-group. It seems safe to assume that the entries in the first 
column are the first samples in the sub-groups, that those in the second 
column are the second samples, and so on. Therefore, columns might 
provide a second criterion. The “between column” estimate of variance, 
based on 4 degrees of freedom, is 56.4, while the new estimate for error 
variance is 12.0. Since the ratio of 56.4 to 12.0 is 4.7, and the 1% point 
for the F-test is about 3.51, it is hard to attribute all of the difference 
between columns to random variation. 

If, in such a situation, there proved to be a practical basis for the 
conclusion that the five columns were samples from universes with dif- 
ferent means, then the propriety of the original control chart analysis 
is open to suspicion. It is clear that the range for each sub-group no 
longer is a random sample from a single universe, and, therefore the 
factors for control limits, which were calculated on that assumption, 
may be so inaccurate as to be practically useless. Discussions of such 
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difficulties and methods of overcoming them are given in two recent 
papers [13, 14]. 

There are three further comments on the Lewis data. First, even with 
the revised estimate of error variance, the small relative size of the 
variation from sub-group to sub-group still is surprising. Second, and 
more important, it is very questionable whether the detection of dif- 
ferences between columns and revision of the control limits would have 
been of practical consequence. Finally, the above discussion should not 
be interpreted as any implied criticism of Lewis or Grant or anyone 
else (including the present author) who has applied the control chart 
technique to these data without deflating the estimate of internal vari- 


ance. 


IS THE PROCESS SATISFACTORY? 


The quality control engineer usually assumes that if the analysis of 
25 samples of 5 units each, taken from past production in time order, 
give no indication of lack of control then practically all of the future 
products can be expected to fall within the limits X+3R/d:, if control 
is maintained. If the quality control engineer is worried about the inter- 
pretation of the phrase, “practically certain” the statistician can help 
him with it, especially if approximate normality can be assumed. 

In his fundamental paper of 1941 on tolerance limits [15, pp. 94-95], 
Wilks supplies the necessary theory from which to calculate the ex- 
pected proportion of a normal universe included between the limits 
X+ks, where X and s are calculated for a single sample of size n (s 
being Fisherian). When k=3.006, (and n=125) the expected value of 
ks is equal to 30’, which also is the expected value of 3R/d2. Then, cal- 
culation from Wilks’ theory gives the expected proportion between the 
limits X¥ +3.006s as approximately 0.9967. This is an approximation to 
the proportion included between the limits X +3R/d2. Use of Bowker’s 
helpful tables [16, p. 105] suggests that the confidence is about 0.99 
that X¥ +3.006s includes at least 99% of the normal universe. 

As an alternative to the above procedure, one might conjecture 
that, since the expected value of X+3R/d2 is X’+3o0’ and the standard 
error of estimate is about 0.240’ for the combined sample of 25 groups 
of 5, it would be satisfactory in practical work to treat X+3R/d2 as 
though it were X’+3o0’ and proceed accordingly. Then it is possible to 
estimate the percentage of product outside specification limits and de- 
cide whether or not the process is satisfactory. 

The percentage outside specification limits could also be estimated 
from the percentage in the sample which is outside specification limits 
but recent work of Baker [17] indicates that this estimate usually is 
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much more variable. (The basic assumption of normality both here 
and in the three preceding paragraphs should be noted.) 

In case the expected percentage outside specification limits is too 
great, a change in the process mean or variability (or both) should be 
made. If the control point has been placed at a stage in fabrication 
where the universe is made up from the combined product of several 
machines or production lines which may be operating at different levels 
with different amounts of variation, then it may be possible to improve 
the process by setting up separate controls for the components before 
they are mixed. 

It would seem unnecessary to dwell on the fact that no process is 
satisfactory where assignable causes of variability have not been dis- 
covered and evaluated. Unless action is taken when the preliminary 
analysis points to the existence of such causes there seems to be no 
good reason to collect data in order of production. Probably it would 
be better to take a single random sample from a week’s production, 
estimate the mean and variance, and, if the product has met past re- 
quirements, use the estimated parameters in taking measures to pre- 
serve the status quo in the future. Of course, this procedure, based on 
the old adage, “Never trouble trouble until trouble troubles you.” pro- 
vides little help, either for avoiding trouble in advance or for process 
improvement. 

A variety of statistical techniques can be useful in detecting the type 
of assignable cause, not only at this stage, but also when controlling 
quality during production. An excellent paper by Olmstead [18] dis- 
cusses the following types: “(1) Gross error or blunder (shift in an in- 
dividual), (2) Shift in average or level, (3) Shift in spread or variability, 
(4) Gradual change in average or level (trend), (5) A regular pattern 
of change in level (cycle).” He describes the statistical tests he has 
found helpful in classifying the cause by type. Having detected the 
type, the job of finding the exact cause still remains, and here, in the 
author’s opinion the quality control engineer becomes the senior part- 
ner in the joint endeavor with the statistician. From his previous knowl- 
edge of like processes, from the experience of his industrial associates, 
and from his engineering knowledge and intuition, he must supply the 
“hunches.” The statistician can help to select or devise suitable tests 
for choosing between hypotheses, or, if necessary, can give valuable as- 
sistance in the design and analysis of laboratory experiments; but, ex- 
cept in unusual cases, he cannot do the whole job alone. Furthermore, 
when the cause has been determined, the quality control engineer must 
take the chief responsibility for getting action; and, as veterans in the 
field know from bitter experience, this is no mean assignment. 
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CONTROLLING QUALITY DURING PRODUCTION 


After past data have been analyzed and the process brought into 
control at a satisfactory level, standard values for the process param- 
eters are adopted and limits based on these parameters are calculated 
for use in controlling quality during future production. Alternatively, 
standard values are based directly on specification limits with no seri- 
ous advance attempt to stabilize the process and determine its ca- 
pabilities. In general, the latter method is less efficient and complicates 
the problem of deciding on the proper action when points fall outside 
control limits. 

Since the rules of the game require action only when a sample point 
falls outside the control band for average or for range, the quality con- 
trol engineer is concerned with two questions. (1) What is the prob- 
ability of looking for trouble when none exists?, and (2) What is the 
probability of failing to look for trouble when it does exist? 

Help in answering these questions is given in a paper [19] by Scheffé. 
He has outlined the method of calculating the operating characteristics 
of the average and range charts and has provided a set of figures which 
ought to be “blown up” and tacked to the wall above every quality 
control engineer’s desk. His paper is a fine illustration of the kind of 
valuable contribution a statistician can make to the more effective 
utilization of present statistical methods in industry. 


CHARTS FOR DEFECTIVES AND DEFECTS 


While the three stages in the application of control charts based on 
measurements can be paralleled in the case of control charts for fraction 
defective or for defects per unit, it is doubtful whether many quality 
control engineers follow such procedure. Usually, after about twenty- 
five data points are plotted and the center line located, the level is so 
much higher than expected that all concerned go to work on the prob- 
lem of reducing it. 

At first, at least, psychology plays the leading role. At stated periods 
new averages are calculated and, if the general level has dropped, there 
is cause for elation. But after errors due to carelessness, lack of expe- 
rience and unsatisfactory working conditions have been reduced to a 
minimum the quality control engineer usually has difficulty in making 
further improvement in the general level unless he can use control 
charts for measurements to supplement his information. 


FURTHER COMMENTS ON CONTROL CHARTS 


A considerable part of this paper has been devoted to a discussion 
of the control chart because the statistician who is planning to help the 
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quality control engineer needs to acquire a thorough understanding of 
how and why it works on production. The statistician who, because of 
honest conviction or lack of appreciation of production problems, labels 
the control chart as a poor tool, alienates the quality control engineer 
and impedes progress in the use of all statistical methods. Furthermore, 
it is ridiculous to withhold appreciation of a philosophy and system 
which, in such a great variety of industrial fields, has been used and is 
being used as an aid in making large reductions in manufacturing costs 
and substantial improvements in product quality. 

On the other hand, the quality control engineer must realize that 
the control chart technique, in the simple form incompletely outlined 
above, may not solve all of his problems satisfactorily. In the first 
place, sampling in time order is only one of many possible ways of 
collecting information for process control and, in some cases, help is 
needed from other methods. Second, once in a while the assumption of 
normality basic to the formulas for control limits for measurement 
charts may be such a bad approximation that new formulas are needed. 
Third, an analysis of costs and potential values may dictate the use of 
alternative methods which either may be quicker and less rewarding 
or may be more elaborate and wring greater information from expen- 
sive data. 

Statistical quality control in the large is more than control charts and 
acceptance sampling ; it is the application of all statistical methods in the 
improvement of the manufacturing operation. The worthy statistician 
recognizes the proper areas of application for the various methods. He 
has had time to learn the theoretical bases of the methods and, there- 
fore, knows their limitation. The worthy quality control engineer is 
constantly expanding his statistical knowledge and, whenever neces- 
sary, is not ashamed to enlist the help of the statistician. 


INSPECTION PROBLEMS 


There appears to be a steady increase in the use of sampling inspec- 
tion in industry. Most of it is based on attributes but the interest in 
acceptance sampling by variables is growing. Among the causes are, 
(1) favorable experience with sampling inspection during the last war, 
(2) use of sampling by government agencies, (3) necessity to make bet- 
ter utilization of labor, (4) increase in the number of industrial people 
acquainted with sampling procedures, and (5) more sources of printed 
information on sampling plans and the details of their administration. 
(Some of the recent publications in this category are [20], [21], [22], as 
well as several chapters of [16].) 

Many quality control engineers are faced with the necessity of choos- 
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ing sampling plans, putting them into operation, and then gauging 
their success. The first and third of these obligations require a fair 
amount of technical statistical knowledge. Among the factors which 
influence the choice of a plan are (1) consumer protection, (2) producer 
protection, (3) economy and (4) simplicity. For any individual plan 
for lot-by-lot acceptance, consumer and producer protection can be 
evaluated from the operating characteristic and average outgoing 
quality curves for the plan. The question of economy can best be an- 
swered by study of the curves for average sample number and for per- 
centage of inspection. 

Where the quality characteristic is measurable, acceptance sampling 
by variables gives considerable economy in amount of sampling. Use of 
plans of this category are particularly to be recommended when the 
inspection damages or destroys the unit inspected. Discussion of sam- 
pling plans based on measurements is given in [16, Chap. I], [21] and 
[23]. 

The statistician can be very helpful to the quality control engineer 
in choosing and analyzing sampling plans of the simple types mentioned 
above. For some of the problems involved in sampling bulk materials, 
such as coal or cement, the combined knowledge and experience of 
both engineer and statistician may be severely taxed. Problems of in- 
spection for visual characteristics may be even more difficult since 
visual inspection often depends largely on the ‘inspector’s judgment. 

One further remark on inspection cannot be avoided. Many manu- 
facturers hesitate to abandon 100% inspection. If it is necessary to have 
this amount of inspection then it is worth while to take measures to 
insure its maixmum efficiency. There are various statistical procedures 
which can be used to help in approaching this goal. A simple way to 
test the efficiency of 100 per cent inspection is to follow it by a sampling 
inspection, preferably plotting results on a control chart. Then effi- 
ciency can be improved by re-training inspectors, by improving work- 
ing conditions, by supplying accurate gauges, etc. 

Decisions as to which inspectors are inefficient, what lighting is best, 
or which gauges are most accurate, should be made partly, at least, on 
the basis of quantitative evidence, properly interpreted. It is possible 
to design experiments to supply the answers to these questions. In fact 
it may be feasible to set up a single experiment which will answer all of 
them at once. Brumbaugh and Noel in a recent paper [24] describe an 
experiment designed to discover the sources of laboratory variability 
of a chemical assay. “The variables were: (a) four analysts; (b) two 
successive days; (c) two independent samples.” The method used was 
the analysis of variance. Conclusion: “The experiment shows that 
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Analyst A is making some errors in technique from day to day. An- 
alysts B and C are doing poor work. Their cases require investigaticn, 
Analyst D is quite capable of doing this work, but appears to be care- 
less.” The analogy to the subject under discussion seems obvious. 


ANSWERS TO THE FIRST QUESTION 


What contribution can the statistician make to greater and more 
effective utilization of present statistical methods in industry? 

These are tentative answers: ; 

1. Make a careful study of the principal methods now in use. Ascer- 
tain who is using them, how they are used, and why they are successful. 

The first part of this paper has suggested a few possible findings. 

2. Grasp every opportunity to help engineers in service to formulate 
their problems precisely, to recognize their statistical nature, to solve 
them intelligently, and to get acceptance for the solutions. 

It is remarkable how effectively the engineer and statistician can 
work together when they meet on a common level. Of course this re- 
quires that each recognize the limitations of knowledge of both and 
act accordingly. 

3. Talk to engineers about current statistical methods. 

During the past few years a number of statisticians have learned to 
talk to engineers. Recently the present author read a lecture (25] which 
Youden presented to a group of non-statisticians in the Department of 
Agriculture. He discussed “How statistics improves physical, chemical, 
and engineering measurements” in such understandable and enticing 
fashion that his audience must have left the lecture with a firm resolve 
to investigate and use the methods outlined. Unfortunately the number 
of such expositors is too small to meet the need. Also it is possible that 
some statisticians do not care to “waste” their time in trying to reach 
a general audience, although this is hard to believe, in view of the 
value of such service. 

4. Write for engineers. 

Most of the engineers in production and inspection “took” calculus. 
They had no opportunity to study statistics. Yet, today, they feel the 
necessity not only to learn statistical methods, but also to acquire some 
of the theoretical background which brings better application. During 
the past dozen years, and particularly since the war, several books 
have been written with the statistical problems of industry in mind. 
Also, good expository papers have appeared in many journals. How- 
ever, more good books and careful papers are needed. 

5. Provide educational opportunities which are tailored to engineer- 
ing needs and problems. 
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Writing on “Statistical training for industry,” three years ago, Wilks 
[26] remarked : 

The greatest obstacle standing in the way of a wider introduction of 
modern statistical methods into industry at the present time is the lack of 
trained personnel and training facilities. Of the research personnel now in 
industry who are concerned with the application of statistical methods, 
virtually none received any statistical training in college. They have picked 
up what they know of the subject through short courses, reading, and dis- 
cussion. .. . The statistical problems which the future scientist or engineer 
will encounter will cut across traditional lines. Therefore, in order that he 
may be properly equipped to deal with these problems, he should have a 
fairly broad statistical training. The training should cover not only statis- 
tical quality control methods as the term is now understood, but the design 
of experiments, analysis of variance, and many other topics. 


In the period since Wilks’ paper was written some progress has been 
made in providing more and better courses both for engineers in service 
and for undergraduates. However, much work remains to be done on 
this very pressing problem. 


AN ANSWER TO THE SECOND QUESTION 


What part can the statistician play in developing new statistical 
methods and in making them readily available? 

There is little doubt that the statistician who becomes familiar with 
production and inspection operations in any industry will speedily 
perceive a large number of statistical problems for which he has no 
ready solution. Upon investigation he probably will discover that many 
of them have been solved. Of the remaining number, he may be able 
to find a few which he can manage. If his theoretical results are inter- 
esting they will probably be published in a place where the engineer 
would never find them and in a form which would give a small hint of 
their possible application. It behooves the statistician, then, to recast 
his findings in a mold familiar to the engineer and to republish them, 
together with plentiful examples, in a magazine which the engineer 
reads. For the present, at least, if he has developed a method which 
can prove useful in several different industries he may need to exhibit 
it in the publications peculiar to those industries. 

The three questions now under consideration were sent to several of 
the author’s friends for comment. In connection with the second ques- 
tion, one wrote, in part, 

It seems to me... that effort must be spent on developing methods 


which will achieve ready shop acceptance because of their logic and sim- 
plicity. 


Another, from industry, wrote: 
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Statisticians will, in any case, have to develop practical applications 
to new theory and “sell” them to industries. Most engineers and industrial 
plants are too busy . . . to delve into theory and develop their own practical 
applications. 


ANSWERS TO QUESTION THREE 


How can the quality control engineer get the maximum benefit from 
the work of the statistician? 

The general answer is this: By recognizing the fact that the statisti- 
cian has something important to contribute and by “going after it.” 

Some quality control engineers believe that the statistician can not 
make any useful contribution to the improvement of production or 
inspection operations. In spite of the demonstrated values of statistical 
quality control and acceptance sampling they would like to relegate the 
statistical aspects of these techniques to a minor position. This is an 
unfortunate, but natural, attitude which would be maintained toward 
any technique which was different and not completely understood. 
Luckily this group of quality control men is decreasing. 

Many of the engineers who are using statistical methods realize 
that their work would be more effective if they had a more thorough 
knowledge of statistics. Many of them are persisting in acquiring that 
knowledge. The engineer gets help from the statistician in one or more 
of the following ways: 

1. By conferring with a statistician about some knotty plant prob- 
lem. 

He may find that the statistician knows little about the technical 
background of the problem and that the statistician talks a language 
which is unintelligible. However, if he is willing to educate the statisti- 
cian, then the statistician will reciprocate and eventually the problem 
is solved. Often such a conference leads to the acquisition of useful 
references, which the statistician may help the engineer to understand. 

2. By taking courses in statistics. 

A number of colleges in industrial areas offer evening courses in 
theoretical and applied statistics, designed especially for engineers. 
Several institutions annually give intensive courses on statistical qual- 
ity control of the type found useful during the war. Others provide 
short intensive courses or conferences on more advanced topics such as 
tests of significance, sequential sampling, multiple regression, and an- 
alysis of variance. (For more information see [27], [28].) Where these 
opportunities for service have been overlooked, it would not be impos- 
sible for a group of engineers to agree that they needed a course in 
statistics, locate a statistician willing to teach it, and then ask a uni- 
versity to authorize the course. This probably would result in a course 
geared at a suitable level to meet their needs. 
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3. By trying to read what the statistician writes. 

Many quality control engineers probably cannot read some of the 
papers now being published in journals and books devoted to statistical 
theory. However, most of these publications have certain parts which 
the engineer can read. It may be only a note or an illustrative example 
but that bit of information or new idea may prove valuable. Further- 
more, by persistence and study, the average engineer will find that 
he can get more and more help and inspiration from these materials. 

Some statisticians are interested in developing “quick and dirty” 
methods. Many of these can be useful in industry. However, most of 
them appear in journals years before they reach the text books. It is 
worth while to learn about these methods early. As a small example it 
seems useful for the quality control engineer to know that in a recent 
issue of the Annals of Mathematical Statistics, Link [29] has tables which 
can be used to help to decide whether one lot is more variable than 
another by comparing sample ranges. When the lots are large and have 
distributions which are approximately normal, then if they have the 
same standard deviation it is unusual for the ratio of the ranges, for two 
samples of ten measurements each, to be less than 0.48 or greater than 
2.1 (The probability is less than one-twentieth.) This test seems to be 
nearly as good as the F-test and, without question, it is a lot quicker. 


CONCLUSION 


In brief conclusion, then, it seems that there are many ways in which 
the quality control engineer and the statistician can work together to 
increase the use of statistical methods in production and inspection. 
Some have been suggested in this paper. Engineers and statisticians, 
who are vitally interested in the goal set forth here, can find many 
additional ways in which to join their efforts toward attaining it. 
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SOME STATISTICAL PROBLEMS IN 
SMALL GROUP RESEARCH* 


Rosekrt F. Bases 
Harvard University 


INTRODUCTION 


or several years at the Laboratory of Social Relations at Harvard 
Five have been engaged in the development of a method for the 
recording and analysis of social interaction.! By social interaction we 
mean the face-to-face conversation or behavior of two or more persons 
as they communicate with each other. There is a rapidly growing em- 
phasis in several of the social sciences on the microscopic study of social 
interaction in small face-to-face groups.” Children at play, classroom 
and discussion groups, committees, planning groups, work groups, 
therapy groups, and many others are being studied by direct observa- 
tion. Researchers face a need for standardized methods of observing, 
analyzing, and comparing the behavior which goes on in widely differ- 
ent sorts of groups under widely different sorts of conditions. Previ- 
ously separated lines of research and theorizing now seem to be con- 
verging toward a coherent body of techniques and theory. The basis 
of research and theorizing is less and less in terms of special interest 


in some concrete type of group, such as a particular interest in planning 
committees as they operate in administrative organizations, or a par- 
ticular interest in classroom groups. The conviction is rapidly growing 
that there can be a generalized theory of small group structure and 
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process. The development of generalized theory of any real power is 
inevitably linked up with the development of standardized and 
generally applicable methods for making the measurements required 
by the body of theory. The present method which we call “interaction 
process analysis” is aimed at this goal. The method is general, in that 
it can be used no matter what concrete subject matter or task is the 
concern of the group, and produces measures of a system of theoreti- 
cally relevant variables. 

The Laboratory has built and equipped a special room in which 
groups can meet and be observed through a set of large one-way mirrors 
from a sound insulated observation room. The observation room is 
equipped with sound recording apparatus and a monitoring speaker 
so that the observers can hear what is going on in the other room. The 
observers are provided with two sets of apparatus, called Interaction 
Recorders, which present a paper tape moving at a constant speed, 
upon which the scores according to the method can be written in 
sequence as the interaction proceeds. The two recorders make possible 
detailed checks of reliability. 

Quite a number of the groups we have used for the development of 
our method of observation have been special groups brought together 
in the experimental room to solve problems which we have set for them. 
Others have been committees or work groups in their natural settings. 
One study of two series of group therapy sessions has been completed. 
In still other cases we have worked with typewritten verbatim protocols 
of interviews. One study has been done of married couples reconciling 
certain differences between husband and wife which appeared on the 
answers to questionnaires they had filled out ahead of time. This 
study included married couples from four different cultural settings, 
including ten Navaho couples. It is probably fair to say, however, 
that up to the present time, our main body of experience and our 
main theoretical concern has been with what could be called decision- 
making or problem-solving conferences of persons of our own culture, 
ranging in size from two to ten. We do a good deal of our research 
on observations already collected, and for many problems do not have 
to set up new experimental designs. In all, our collected body of data 
includes more than 100,000 scores obtained from observations of around 
150 groups. 

Since the method aims at generalized applicability, it might be said 
that we are trying to create a body of small group statistics, regarding 
the frequency of occurrence of given types of behavior in small groups, 
such as agreement, disagreement, and the like, under varying sorts 
of conditions. In this task, we face many of the same sorts of problems 
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(standardization of nomenclature, development of indices and sum- 
mary measures, statistical methods of inference from measures), 
faced by any investigator who tries to gather comparable sorts of data 
over a wide range of conditions, and draw reasonable inferences from 
them. At the same time, we are confronted with certain special prob- 
lems of statistical treatment arising out of the nature of our method 
and data. It may be that some of these problems, arising out of a par- 
ticular substantive field, will be of more general interest to statisticians. 

In order to illustrate how some of our statistical problems arise, let 
me describe briefly, without proper qualification, how the observations 
are made, some of the ways of tabulating the data, and how the scores 
tend to fall. After outlining these ways of tabulating the data for exami- 
nation, I will indicate some of the statistical difficulties. 


DESCRIPTION OF THE METHOD 


The observer has before him the following twelve categories: 


1 SHOWS SOLIDARITY, raises other’s status, gives help, reward. 
2 SHOWS TENSION RELEASE, jokes, laughs, shows satisfaction. 
3 AGREES, shows passive acceptance, understands, concurs, complies. 


GIVES SUGGESTION, direction, implying autonomy for other. 
GIVES OPINION, evaluation, analysis, expresses feeling, wish. 
GIVES ORIENTATION, information, repeats, clarifies, confirms. 


7 ASKS FOR ORIENTATION, information repetition, confirmation. 
8 ASKS FOR OPINION, evaluation, analysis, expression of feeling. 
9 ASKS FOR SUGGESTION, direction, possible ways of action. 


10 DISAGREES, shows passive rejection, formality, withholds help. 
11 SHOWS TENSION, asks for help, withdraws “Out of Field.” 
12 SHOWS ANTAGONISM, deflates other’s status, defends or asserts self. 


Each category is defined in detail, both in relation to each of the 
other categories, and in terms of more specific concrete examples. The 
set of categories is thought of as something more inclusive and system- 
atic than a mere “list.” We treat it as if it were logically exhaustive 
of all possibilities on its own level of abstraction. This means that 
every possible act is regarded as properly classifiable into one of the 
twelve categories. Furthermore, all of the categories are positively 
defined, that is, none of them is treated as a residual or wastebasket 
category. Each act is scored in one and only one positively defined 
category. 

The observer watches the behavior of the participants and listens 
to what they say. First one talks, then another. When the observer 
classifies a given act, he records it in the proper category by writing 
down two numbers, the first the identification number of the person 
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speaking, and the second the identification number of the person spoken 
to. For example, suppose two persons are talking together, identified 
as 1 and 2. Person 1 says, “What time is it?”. The observer decides 
that this is an act of “Asking for Orientation” Category 7. He writes 
down the numbers 1-2 in the space following Category 7. Person 2 
now says, “I have just twelve o’clock.” The okserver writes down the 
numbers 2-1 in the space following Category 6, “Gives Orientation.” 
Person 1 then says, “Thank you,” and the observor puts down the 
numbers 1-2 in the space following Category 1, “Shows Solidarity.” 
The observer continues to score in this sequential connected fashion 
as long as the interaction lasts, or until some arbitrary stopping point. 
Each time he enters a score, he thus records three items of information, 
a quality classification of the act, who performed it, and toward whom. 
For some of our work we record the scores on a moving paper tape, 
and thus add a fourth item of information: the position of the act in 
a continuous sequence. 

This description of the scoring procedure makes it clear that each 
time the action changes hands, a new unit starts. Often, however, a 
person continues to talk for some time. In the example above, Person 
2 might say, “I have just twelve o’clock. But my watch has been 
stopping lately. I don’t know whether that is the right time or not.” 
The observer would enter three units for this sequence, one for each 
sentence. If the person had managed to say this all in one sentence, 
the observer still would have broken it down into three scores, since 
there are essentially three items of information, or logical points 
conveyed. The unit we use is thus close to a “single proposition” or 
“subject-predicate” unit, although this is not a perfectly satisfactory 
definition for all of the categories. Operationally we define the unit as 
the smallest discriminable item of behavior that will satisfy the defi- 
nition of one of the categories, or to put it in a slightly different way, 
meaningfully complete enough to support an interpretation by the 
observer, or a reaction of the other person in the conversation. The 
unit is defined then, in terms of a change of meaning within a system 
of symbols used in communication, not in terms of time or space or 
gross physical movement. Nevertheless, the unit is relatively so small 
that it turns out there is usually quite a close. correspondence between 
the number of scores we record for a person and the total chronological 
time he spends speaking. On the average, we obtain between ten and 
twenty scores per minute, or something over six hundred scores per 
hour on most interaction we have scored. 

One difficulty with the unit we use is that in spite of our best efforts 
we have not been able to define it well enough so that all observers 
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come out with the same number of scores. Part of the discrepancy can 
be attributed to more or less mechanical difficulties of the operation, 
and training of observers helps considerably. A more fundamental 
difficulty, however, is that the operation which yields a score is one 
of inference and interpretation, and not simply one of counting. There 
is no way of obtaining an absolutely “correct” answer, as one might 
if he were counting marbles or people. The observer is counting “events” 
which are intimately interwoven in an overlapping way in a complex 
process. The difficulties are still further aggravated by the fact that the 
events cannot be repeated for a recount, at least not in a complete 
sense. We typically take sound recordings of all the interaction we 
score, but the complete who-to-whom part of the scoring cannot be 
done from the sound record. 

Finally, there is the difficulty of the speed with which the actual 
interaction moves. Ideally, the observer is supposed to record every 
act, but even highly trained observers miss some of the scores, and get 
only a large sample of the number and quality of the scores they might 
have obtained if they had been omniscient and infinitely fast. The 
discrepancies between two observers scoring the same original interac- 
tion are probably not all attributable to differences in judgment— 
some of the differences are probably attributable to the fact that they 
have taken slightly different large samples from the total population 
of acts which could have been scored in the given session. 


WAYS OF ANALYZING THE OBSERVATIONS 


When we finish with the observation of a meeting, there are several 
things we typically want to do. First we want to compare the scorings 
of two or more observers to see whether the data are reliable enough 
to use. Second, we want to compare the given meeting with other 
meetings, say of control groups, or with some hypothetical construc- 
tion as to the distribution of our scores expected under given condi- 
tions. In either case, the major statistical problems we face are those 
of finding appropriate tests of significance of differences. 

We are interested in the kinds of differences which appear under 
different experimental conditions. But we are also interested in the 
approximate uniformities of process which seem to appear in spite of 
rather wide differences in experimental conditions. This is a major 
focus of interest, and we want to make our observations bear on this 
area of similarities as well as on differences. 

Profiles: One of the obvious ways of examining the data is to find the 
distribution of total number of acts between the twelve categories, 
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according to quality.* A distribution of this kind either in raw scores, 
or in percentage rates based on the total, we call a “profile.” Thus we 
may make profiles of total meetings, of sub-periods within meetings, 
or of each member separately. We may make profiles of the activity 
observed by different observers of the same meeting. The shape of 
the quality profi’: is one of our most useful summary measures. Our 
experience indicates that different kinds of groups operating under 
different kinds of conditions produce different types of profiles. For 
example, some groups disagree more than others. Some groups joke 
and laugh more than others. The profiles we see, however, are not com- 
pletely and radically different from each other. For example, what 
we call “Attempted Answers,” that is giving orientation, giving opin- 
ion, and giving suggestion, are nearly always more numerous than their 
corresponding “Questions,” that is, asking for orientation, opinion, 
or suggestion. Similarly, “Positive Reactions,” that is, agreement, 
showing tension release and solidarity, are typically more numerous 
than “Negative Reactions,” showing disagreement, tension, and 
antagonism. Indeed, it is hard to see how groups could continue to 
operate without breaking up if there were more questions than answers 
and more negative reactions than positive. On the average, for groups 
we have examined, Attempted Answers account for a little over 50 
per cent of the total activity, with Questions, Positive Reactions and 
Negative Reactions accounting for the other half. These tendencies 
appear to us to be more or less “built into” interaction that is pur- 
poseful and goal-directed, if indeed it is “getting anywhere” and pro- 
ducing “satisfaction” even in a minimal way. We are inclined to believe 
that the interesting differences we see between groups tend to be rela- 
tively minor variations of this general pattern, and that radically dif- 
ferent patterns are rare. 

Sequences: When we look at the way in which specific acts tend to 
lead to other specific acts, these notions are strengthened. When we 
take out the effects of the gross over-all frequency, and also tendencies 
toward repetition in the same category, the mechanisms by which 
the gross distribution arises become apparent. Questions tend to lead 
to Attempted Answers. These in turn tend to lead to Positive Reac- 
tions. When Negative Reactions appear, they tend to lead back to 
more Attempted Answers. We are not surprised to find that Questions 
seldom lead directly to either Positive or Negative Reactions, nor that 
Positive Reactions seldom lead directly to Negative Reactions, or 
vice versa. Again, the interesting differences between groups would 
appear to be in the relatively minor variations of this general pattern. 





3 Bales, Robert F., “A Set of Categories for the Analysis of Small Group Interaction,” American 
Sociological Review, Vol. XV, No. 2, April, 1950. 
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The pattern itself as a general type of movement seems to be charac- 
teristic of successful process in a very fundamental way. 

Phases: Another way of tabulating the data is by time periods within 
the single meeting. We have explored changes in quality of activity as 
groups move through time in attempting to solve problems.‘ These 
changes in time we call “phase patterns.” The pattern of “phases” 
differs in detail under different conditions. However, these changes 
in quality seem to be subject to system-influences which produce 
similarities from group to group. An increase of task-oriented activities, 
that is, Questions and Attempted Answers, seems to constitute a 
disturbance of a “balance” of some kind in the system, and tends to 
be redressed later by an increase in social-emotional activities, that is, 
both Positive and Negative Reactions. 

Part of our observations have been kept by time sequence. We have 
divided each available meeting (about 22 cases, totaling about 14,000 
scores) into three equal parts, and have calculated the amount of 
each type of activity in each part of each meeting. We also divided 
the meetings into two kinds: those which were dealing with what we 
called “full-fledged problems” (essentially problems of analysis and 
planning with the goal of group decision), and those dealing with 
more “truncated” or specialized types of problems. Those groups 
dealing with full-fledged problems tended to show a typical phase 
movement: the process tended to move qualitatively from a relative 
emphasis on attempts to solve problems of orientation (“what is it?”) 
to attempts to solve problems of evaluation (“how do we feel about it?”) 
and subsequently to attempts to solve problems of control (“what 
shall we do about it?”). Concurrent with these transitions, the relative 
frequencies of both negative reactions (disagreement, tension, and an- 
tagonism) and positive reactions (agreement, tension release, and 
showing solidarity) tend to increase. 

Matrices: Another type of tabulation we have explored is the way in 
which participation is distributed between members.’ The pattern of 
distribution is different in detail under different conditions. However, 
in spite of these differences, the distribution pattern of total amounts 
of participation of each member, as well as the pattern of who talks 
how much to whom, seems to be subject to system-influences, of an 
economic-ecological kind, which tend to produce similarities from 
group to group. We have collected all our data in terms of group size 
and type of group, nature of conditions, etc., and have developed a 





4 Bales, Robert F., and Strodtbeck, Fred L., “Phases in Group Problem-solving,” Journal of 
Abnormal and Social Psychology (in press). 

5 Bales, Robert F., Strodtbeck, Fred L., Mills, Theodore M., and Roseborough, Mary L., “Chan- 
nels of Communication in Small Groups,” American Sociological Review (in press). 
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way of representing the total number of different possible combina- 
tions of who is speaking and to whom which we call a “matrix.” We 
have noted that the matrix differs in apparently reasonable ways under 
different conditions. Groups with no designated leader, for example, 
tend to have more equal participation than groups with designated 
leaders of higher status. 

In spite of wide variations in conditions, we have noted strong tend- 
encies toward similarities in the matrices. The following generaliza- 
tions have been found to hold with a high degree of regularity: If we 
arrange the personnel in rank order according to the total amounts 
they speak, we then find that the amounts which they receive are 
arrayed in the same rank order. And more specifically, each man tends 
to speak to each other in an amount which is directly related both to 
his own rank and the rank of the other. The top man in groups larger 
than five or so tends to speak considerably more to the group as a 
whole than to specific individuals in the group. All other members 
tend to speak more to specific individuals (and particularly to the top 
man) than to the group as a whole. As the groups increase in size, a 
larger and larger proportion of the activity tends to be addressed to 
the top man, and a smaller and smaller proportion to other members. 
In turn, as size increases, the top man tends to address more and 
more of his remarks to the group as a whole, and to exceed by larger 
amounts his proportionate share. There seems to be a top ceiling for 
him however, around 50 per cent, possibly connected with a general 
tendency for the interaction to come to a system-closure, such that 
each “action” as it were, tends to be countered with a “reaction” 
from some other. Even if the top man is giving most of the orientation, 
opinion, and suggestion to the group as a whole, he still may expect to 
receive a “backwash of reactions,” both of a positive and negative 
sort, that will tend to equal the amount of action he proposes. There 
is thus an intimate relation between the typical profile, the typical 
phase movement, and the typical matrix. 


STATISTICAL DIFFICULTIES 


The various tendencies I have described are empirical averages 
based on our total body of relevant data. There is no implication that 
the interaction process actually goes in the way described under all 
conditions. Quite the contrary. It is easy to show striking differences in 
profiles, phases, and matrix under differing conditions, although for 
reasons I will point out, we do not have appropriate measures of 
significance of differences. We do believe, however, that we will soon 
be able to specify the conditions under which these average tendencies 





PROBLEMS IN SMALL GROUP RESEARCH 319 


appear, and that they will prove to be very general and common to a 
great many different situations. As soon as we are able to define the 
specific conditions in a way satisfactory to us, we intend to run a suffi- 
ciently large series of groups under these empirical conditions to find 
out something about the sampling distributions of our major measures, 

We have the notion that this particular set of conditions and the 
sampling distributions of our measures connected with them will 
prove to be a most advantageous baseline for further theory construc- 
tion. Once this baseline had been established other sets of conditions 
expected to have different results can be described as modifications 
or accentuations or reversals of the more “simple” baseline conditions 
which produce regular gradations by time, members, and group size. 
In single-purpose control—group experimental—group designs the prob- 
lem of specifying in detail the way in which various conditions affect 
the total process can be by-passed, since one simply tries to hold all 
conditions constant with the exception of one, which is the experi- 
mental variable. In designs of this kind one is usually looking for one 
or a few types of effects, and does not necessarily raise questions about 
changes in the total distribution of a system of rates or measures. Since 
we believe that the interaction process has system characteristics, 
however, and are interested in these, we are attempting to find a system 
of descriptive measures which will describe the total state of the process 
with regard to these variables, and how this total state changes under 
various conditions. 

Since this is the kind of thing we want to describe, we run into special 
kinds of difficulties. First, because the process we are trying to measure 
seems to have a kind of organic character, in which parts are highly in- 
terdependent and take time to run through their full course or cycle, we 
do not feel that we can sample from the complete meeting, but have 
to take it as it is. Thus, the population we are sampling from for most 
purposes, except when we are concerned with reliability of observers, 
is a hypothetical population of cases, each of which is a complete 
meeting, not a population of single acts. In taking the meeting as the 
sampling unit, we are handicapped because we cannot control the size 
of our sample units. Some meetings yield 500 scores, some a thousand 
or two. Thus it results that usually two profiles or matrices we wish 
to compare are sufficiently different in the total number of scores that 
some correction has to be made. 

In developing our exploratory averages and comparisons it has been 
our practice to work out all internal cells of the profile or matrix as 
percentages of the total number of scores (whatever it happens to be) 
for the given meeting. This makes comparison of distributions possible 
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even though the total number of scores is widely different, but 
obscures the factor of the size of our sample units. This difficulty can 
be overcome by multiplying the percentage rates on the profiles or 
matrices we are using as the base of comparison by the raw number 
total of the observed meeting before making comparisons. This proce- 
dure gets around the problem that the sizes of our sample units are 
different, but it does not get around the problem of the lack of inde- 
pendence of acts as they are distributed between cells. 

Just to the extent that the actual process does have a system-char- 
acter, the acts falling in one cell affect the probabilities of acts falling 
in other cells. To recall only two examples of the kind of interdepend- 
ence we encounter, if an act is a Question, the probability that the 
next act will be an Attempted Answer increases; at the same time, 
when a given person asks a Question, the probability that a different 
person will attempt to answer it is increased. The statistical problem 
arises out of the fact that chi-square and similar tests of significance 
of difference assume independence, that is, that the occurrence of a 
particular item does not affect the probability that other items fall to 
various cells. This condition cannot be met for any of our major dis- 
tributions of measures, the profile, phase movement, or the matrix. 

On the profile, which it will be recalled, is a distribution of the 
number of acts falling in each of the twelve quality categories, the 
rate for each category is based on the grand total, and all rates add to 
100 per cent. Thus, if one type of activity increases, not only do the 
empirical probabilities of certain other types change, as just indicated, 
but because all are calculated on a common base, each other rate is 
decreased by a small amount. In addition to these difficulties, the 
rates are of widely different orders of magnitude. The rates for certain 
categories are of the order of 20 and 30 per cent, while those of others 
are of the order of 1 or 2 per cent. We do not know, on theoretical 
grounds, what the sampling distributions of measures like this look 
like, first because of their empirical interdependence as different parts 
of a systematic process, second because our way of measuring them 
forces us to represent them as percentage rates, and third, because 
of their great differences in magnitude. Presumably we can obtain 
empirical sampling distributions for a given set of conditions, once we 
settle on what we think is a sufficiently generalized and strategic set 
of conditions to invest the labor, but we should like to know whether 
there is a mathematical way of attacking the problem. 

The situation with regard to the matrix is similar, but with addi- 
tional complications. First the number of cells in the matrix increases 
with each increase in group size, and we should like to be able to 
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work at least in the size range from two to ten. Even when group size 
is the same, the matrices we wish to compare are usually different in 
total number of scores, so we are again forced to rates as the basis of 
comparison. Similarly, the amount of activity of each member is 
empirically dependent upon the amount of activity of each of the 
others, and there are considerable differences in magnitude. 

On the matrix there is an additional special problem which may 
have some generalized interest. It will be recalled that in order to 
make the matrix in the first place, we have to take the total amounts 
of participation of each member as the basis for ranking them on the 
matrix. Consider a sample matrix for a group of three persons, Joe, 
Bill, and Henry. After these three men have finished their meeting, 
we find the total number of scores each man has initiated, and rank 
them on this basis—Henry, 421, Bill, 336, and Joe, 243, total 1000 
scores. The matrix is then constructed as follows: 


Illustrative Interaction Matriz for a 3 Man Group 
TO 
Sum To 
to group 
specific as & 
persons whole 


Henry Total 





FROM 


Henry — 81 


Bill 
Joe 


Total 


220 


201 


421 





148 


56 





204 


336 








110 





42 








—— 


152 








91 








243 








258 


181 


137 


576 


424 


1000 


The uniformities I described earlier were derived from matrices like 
this, and we have found that marked departures from the empirical 
averages in given cells, and departures from their regular gradation, 
are usually connected with specific features of the conditions we are 
able to identify, and to a certain degree predict. For example, if we 
manipulate things ahead of time so that one man holds a contrary 
opinion on an issue the group is discussing, we may not be able to 
predict that he will be rank 1 man on the total scores given out, but we 
can predict with a fair degree of confidence that he will receive more 
in total than a man in his rank position usually receives. These differ- 
ences are usually marked enough that we are willing to place confidence 
in them even though we do not have an appropriate test of significance 
of difference. 

But suppose the differences are not so marked and we wish to com- 
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pare two matrices. What is an appropriate test? How has the ranking 
operation, for example, leaving all other problems aside, affected the 
variability of the numbers that appear in the cell showing the total for 
the top man? Or to generalize the problem slightly—suppose we have 
two groups of three men each with total output of the following 
kinds: 
Group I Group II 

Joe 75 Clyde 100 

Henry 50 Stephen 60 

Bill 20 Walter 40 


is there any reasonable statistical model that will make a comparison 
of these two groups possible? We have done enough exploration of 
the matrix to feel that it provides a good way of examining the data, 
so that we will be disappointed to come to the conclusion that it 
cannot be handled statistically. But again, as in the case of the profile, 
in constructing our method of analysis we have arrived at a constella- 
tion of measures for which we do not know the sampling distributions. 
We have hopes that the problem can be attacked mathematically as 
well as empirically. 


SUMMARY 


To summarize, I have described a method of observing and recording 
social interaction which produces quantitative measures of what appear 
to be theoretically important variables. Preliminary explorations indi- 
cate that the process observed has system-characteristics, and our 
measures are empirically as well as logically interdependent. If this 
is true, a theory of the system as such seems possible, and in fact as- 
sumes a central position in the research. This may turn out to be the 
most important generalized implication of the research. Success in 
solving the problem of how to handle the interdependence of variables 
in small social systems may have applications in treating other types 
of empirical systems not necessarily social. At present, the fact of 
interdependence of variables leaves us without appropriate tests of 
significance of difference on our major distributions of measures. It 
appears that we will have to have some skilled help in the application 
of mathematical statistics in order to solve these problems in any 
fundamental way. It is to be hoped that statisticians will become inter- 
ested in the field of small group research as they have in other substan- 
tive fields, and will help to solve some of its particular problems. 
Moreover, it may turn out, as it has before, that a new substantive field 
will provide the ground for the development of some new statistical 
methods of general significance. 





RELATIONS BETWEEN PRICES, CONSUMPTION, 
AND PRODUCTION* 


Kart A. Fox 
Bureau of Agricultural Economics 


I 


HIS paper summarizes the history and present status of efforts to 

derive statistical demand functions from time series—typically 
series of annual observations. Supply functions are mentioned only 
incidentally. Special problems involved in estimating demand relation- 
ships for time units materially shorter or longer than a year are also 
disregarded. 

The statistical derivation of “demand curves” is a development of 
the present century. Aside from the pioneer attempts of Benini [1], 
Moore [2, 3], and one or two others, applied work in this field did not 
really get under way until after World War I. Considering the effect 
of economic and political upheavals upon the continuity of research, 
it is not surprising that we arrive at mid-century with some major 
methodological questions unsettled and only a limited number of 
accepted results. 

The late development of statistical demand analysis was due to its 
dependence on two previously unrelated disciplines, economic and 
statistical theory, and also to its dependence upon the scope and ac- 
curacy of published economic data. 

The requisite economic theory was available at an early date. 
Cournot (4) stated the economic theory of demand in a form which 
lent itself to numerical applications and suggested that “it would be 
easy to learn, at least for all articles to which the attempt thas been 
made to extend commercial statistics, whether current prices are 
above or below” the value which would maximize gross revenue. How- 
ever, fifty years went by before statistical concepts even imperfectly 
adapted to demand analysis became available. The theory of correla- 
tion was elaborated during the 1890’s, and several more years elapsed 
before anyone tried to apply it to price-quantity relationships. 

It would take us too far afield to discuss the slow development of 
published economic data—particularly continuous time series on pro- 
duction, consumption and income. In this country, such series on 
national income and on food consumption date from the 1930’s. 





* Address, joint meeting, American Statistical Association, Econometric Society, and American 
Economic Association, Chicago, December 29, 1950. 
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Agricultural price analysis in the 1920’s was seriously hampered by 
inadequate data, and agricultural data prior to World War I were 
still more limited in scope and accuracy. Nevertheless, it was self- 
evident to Moore [3] that “the most ample and trustworthy data of 
economic science” were official statistics. 

The objectives of statistical demand and supply analysis from its 
inception have included prediction and control. In 1917 Moore [3] wrote: 
“The business of economic science... is to discover the routine in 
economic affairs. It aims to separate out the elements of the routine, 
to ascertain their interdependence, and to use the knowledge of their 
connections to anticipate experience by forecasting from known changes 
the probabilities of correlated changes. The seal of the true science is 
the confirmation of the forecasts; its value is measured by the control 
it enables us to exercise over ourselves and our environment.” 


II 


Progress in the analysis of demand may be measured either in terms 
of the number and accuracy of specific results or in terms of the de- 
velopment of methods. By far the largest concentration of published 
price-consumption analyses relates to agricultural products. There are 
a number of reasons for this, including the abundance of official 
statistics, the relative homogeneity of agricultural commodities over 
time, the simplicity of the “identification problem” for many farm 
products,! and the dominant role of public institutions in agricultural 
research. Research results of the Department of Agriculture and the 
State agricultural colleges are published as a service to farmers and the 
general public. Similar research may be done by any number of manu- 
facturing and merchandising organizations and consulting economists, 
but the results are seldom published. In consequence, the extent and 
quality of price-consumption analyses for nonfarm products are hard 
to determine. 

In the 1920’s, economists in the U.S. Department of Agriculture and 
in the State agricultural colleges made numerous analyses of price- 
quantity relationships for agricultural commodities. The primary ob- 
jective of these studies was to provide information by means of which 
farmers could adjust their production and marketing plans. Although 
the rate of publication of agricultural price analyses has slowed down 
considerably since about 1933, the results of the earlier period have 





1 Discussed in Section IV of this paper. The essence of the problem is that price-quantity observa- 
tions are points of intersection of a demand curve and a supply curve. Whether the statistical regression 
of price on quantity will approximate the demand curve, the supply curve, or neither depends upon 
the relative magnitude of shifts in each curve and the correlation (if any) between the shifts. 
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been used, modified and extended in various places. At the present time 
demand analyses of some sort exist for aggregates such as all farm 
products, all foods, food livestock products, meat animals and meats, 
and for a considerable number of individual products.? Supply or 
acreage-response analyses have been made for potatoes, cotton, flax- 
seed, hogs, eggs, chickens and some other commodities. 

Statistical demand analyses for nonfarm products may be said to 
date from 1914, when Moore [2] derived his oft-cited positively inclined 
“demand curve” for pig iron. Whitman [5] derived demand functions 
for steel which showed the negative slope required by economic theory. 
Several analyses have been made of the demand for automobiles, and 
at least one each for furniture, refrigerators, electric ranges, washing 
machines and vacuum cleaners and for boots and shoes. Roos has pub- 
lished analyses for some raw industrial commodities [6] and, with von 
Szeliski, an elaborate study on the demand for automobiles [7]. Whit- 
man [8] published the results of some experiments on short-run re- 
sponses of retail sales to price changes. There are doubtless a large 
number of analyses in the files of private consultants and the research 
divisions of large corporations, but these are not generally available. 


III 


The methodology of statistical demand analysis has received much 
more attention in professional journals than has the publication of 
commodity studies. At the present time persons doing applied work in 
demand analysis may be divided into three groups. The first group 
carries on in the tradition of Moore and Ezekiel, using the single 
equation, least squares approach and relying upon judgment to cope 
with pitfalls such as multicollinearity and nonidentifiability. The 
second group supplements this approach with the application of bunch 
map analysis to select “useful” variables and to avoid multicollinearity. 
The third, centering around the Cowles Commission, uses a multiple 
equation approach and takes explicit account of the so-called “identi- 
fication problem.” The methods used by the three groups were largely 
developed in three successive decades. 

Henry L. Moore was the principal founder of the first and earliest of 
these groups. His books on Economic Cycles (2; 1914] and Forecasting 
the Yield and Price of Cotton [3; 1917] furnished the inspiration for 
much of the agricultural price analysis which was carried on in the 
United States during the 1920’s. 

There are two main points of interest in Moore’s 1917 book for the 





2 Some new analyses by the author appear in the July 1951 issue of Agricultural Economics Research. 
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present discussion. The first is his practical resolution of the difficulties 
which seem to flow from the general equilibrium approach of Walras. 
Moore writes: “One of the discouraging aspects of deductive, mathe- 
matical economics is that when a complete theoretical formulation is 
given of the possible relations of factors in a particu'ar problem, one 
despairs of ever arriving at a concrete solution because of the multi- 
plicity of the interrelated variables. But the attempt to give statistical 
form to the equations expressing the interrelations of the variables 
shows that many of the hypothetical relations have no significance 
which needs to be regarded in the practical situation.” This was a 
necessary conclusion if any applied work was to be done at all. 

The second point is what we would now regard as a naive faith in 
the efficacy of multiple correlation analysis. “No matter what may be 
the number of factors in the economic problem, it is specially fitted to 
make a ‘quantitative determination’ of their relative strength; and 
no matter how complex the functional relations between the variables, 
it can derive ‘empirical laws’ which, by successive approximations, will 
describe the real relations with increasing accuracy. ... When the 
method of multiple correlation is thus applied to economic data it in- 
vests the findings of deductive economics with ‘the reality and life of 
fact’; it is the Statistical Complement of Deductive Economics.” 

This faith in multiple correlation characterized much of the applied 
work of the 1920’s. Henry Wallace [9; 1920] decided that hog receipts 
at eleven markets were a more accurate indicator of hog prices than 
receipts at Chicago, and that prices of Connelsville coke were a better 
indicator of the demand for hogs than were bank clearings outside New 
York City. His sole basis was that the preferred variables gave a mul- 
tiple correlation of .70 while the others yielded only .65. A relatively 
sophisticated practitioner, B. B. Smith [10; 1926], in describing the 
technique of multiple curvilinear correlation by successive approxima- 
tions, wrote “As long as it is possible by further approximations to 
raise the value of rho” (the index of multiple correlation) “continued 
approximations are justified.” The only restraint on this process was 
that the curves should be smooth. Smith [11; 1925] was the first agri- 
cultural price analyst to adjust multiple correlation coefficients for 
“degrees of freedom.” In 1929 Ezekiel [12] noted that error formulas 
had been little used up to that time. 

Nevertheless, by the end of the 1920’s, leaders of this first group had 
recognized and suggested solutions for several of the major problems of 
the single equation, least squares approach. Holbrook Working [13; 
1925] pointed out that the curves we could hope to approximate with 
the agricultural data then available were demand curves of dealers 
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rather than consumers. He also called attention to the fact that errors 
or disturbances in the independent variables gave a downward bias to 
least squares regression coefficients. Elmer Working [14; 1927] gave a 
clear account of what is now called “the identification problem.” 
Schultz [15; 1928] calculated weighted regressions to allow for the 
presence of errors in the explanatory variables. Recognition of sampling 
errors and tests of significance came at the very end of the decade and 
occupied a separate chapter in Ezekiel’s 1930 book [16] on correlation 
analysis. Schultz’s article [17] on the standard error of a forecast ap- 
peared in the same year, although it had been to some extent antici- 
pated by Working and Hotelling [19] in 1929. 

The two monuments of the first group were Ezekiel’s “Methods of 
Correlation Analysis” [16; 1930] and Schultz’s “The Theory and 
Measurement of Demand” [18; 1938]. Schultz’s applied work belongs 
with this group although some of his theoretical chapters go beyond the 
usual scope of its interests. 

The second group doing work on demand analysis relies on methods 
developed by Ragnar Frisch [20; 1929 and 21; 1934]. Frisch was con- 
cerned with the danger of obtaining spurious results due to the com- 
bined (and unrecognized) effect of random errors and high inter- 
correlation between the explanatory variables. He believed that this 
situation was very common in practice, and wrote [21] that “a sub- 
stantial part of the regression and correlation analyses which have 
been made on economic data in resent years is nonsense for this very 
reason.” To cope with this problem, Frisch developed his method of 
“statistical confluence analysis by means of complete regression sys- 
tems.” This technique was used extensively by Tinbergen in business 
cycle analysis [22; 1939] and by Stone [23; 1945] and Prest (24; 1949] 
in the analysis of price-consumption relationships. 

The third group is largely identified with the Cowles Commission 
and is almost wholly a development of the past decade. Marschak 
[25] traces the systematic consideration of the identification problem 
back to an unpublished memorandum by Frisch in 1938. The first 
major article on what is frequently called the Cowles Commission 
technique was published by Haavelmo in 1943 [26]. The main feature 
of the Cowles Commission approach is its emphasis upon the simul- 
taneous determination of interdependent relationships. Moore [27] and 
other analysts had used two or more equations to indicate an equilib- 
rium solution, for example, the intersection of a supply and a demand 
curve to determine price. Tinbergen [22] calculated large numbers of 
equations which were theoretisally interdependent, but his method of 
fitting assumed that each of them was statistically independent. 
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The Cowles Commission approach is new and different in that the 
methods of fitting flow from specification of an economic model in 
terms of the joint probability distribution of all variables included in 
it. The “identification problem” takes on central importance and the 
conditions for identifiability of (say) a demand curve in a multiple 
equation structure have been worked out on a level of mathematical 
sophistication unknown to the price analysts who first raised the prob- 
lem in the 1920’s. 

The third group has already erected its theoretical monument in 
Cowles Commission Monograph No. 10, “Statistical Inference in 
Dynamic Economic Models” (25; 1950]. The amount and high quality 
of effort that has gone into the Cowles Commission approach is truly 
impressive. However, its applications so far have been limited in num- 
ber and the areas in which it is superior to other methods have not yet 
been clearly defined. Its basic assumption is that “economic data are 
generated by systems of relations that are, in general, stochastic, 
dynamic, and simultaneous” [Marschak, [25]; 1950]. However, there 
are certain cases, particularly in agricultural price analysis, where 
simultaneity is of limited importance. In such cases it may be doubted 
whether the elaborate procedures of the Cowles Commission will 
improve or even change the results of the single equation approach 
within the limits of sampling error. 


IV 


Obstacles to progress in the analysis of price, supply, and consump- 
tion relationships are of several types. These include the availability 
and accuracy of economic data, the difficulty of identifying “demand 
curves” in the presence of other simultaneous relations, and technical 
problems such as multicollinearity, serial correlation and choice of 
functional forms. 

Relatively few economic data are collected according to the specifica- 
tions of economic theory. Hence, demand analysis is frequently check- 
mated by the non-existence of series which even approximate the 
required economic variables. The likelihood that analysis will be 
frustrated by “missing links” increases rapidly as more and more 
complicated economic structures are considered. While we can perhaps 
supply these links in the future, it may be utterly impossible to recon- 
struct them for the past. The substitution of other series, such as 
production for consumption, may lead to useful insights in a simple 
model, but two or more makeshifts of this sort may completely destroy 
the significance of a complex one. 

A good deal of data exists in the hands of private firms but is not 
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made available to the public. These data would include information 
bearing on the cost and supply functions of the firm. With its special 
knowledge of factors on the supply side, the firm might be able to 
derive and “identify” demand curves for its products even though 
published information was insufficient to permit outsiders to do so. 

Even where data are conceptually adequate, they may be seriously 
inaccurate due to limitations of the collection process. Eventually a 
wide range of economic statistics may be based on probability samples 
and their standard errors known, but this is not true of most existing 
series. Sarle [28, 29] appraised the farm price and yield estimates of 
the Department of Agriculture from a sampling viewpoint, and more 
work of this sort is needed. The accuracy of data based on enumera- 
tions may suffer from varying degrees of incompleteness and bias. It 
will frequently be better to make rough allowances for the errors in 
each variable than to ignore them completely. 

The “identification problem” is inherent in the nature of economic 
data. A set of simultaneous price-quantity observations describes the 
points of intersection of a supply curve and a demand curve. Unless 
additional information is available (for example, on the variables 
causing shifts or “disturbances” in each curve) we do not know whether 
a curve fitted to the observations is a demand curve, a supply curve, 
or some uninterpretable combination of the two. 

The Cowles Commission has done a great service by generalizing the 
problem of identification and working out the mathematical require- 
ments for its solution. Their work will force economists generally to 
take more explicit account of this problem than they have in the past. 
Fortunately, the identification problem can be readily solved for an 
important class of agricultural commodities. For many of these, 
particularly annual crops, current production is not influenced by 
current price. Hence, a net relation between production and current 
price will approximate a demand function. In Marschak’s terminology 
[25] this demand function will be a “uniequational complete model.” 
Most applications of the single equation approach which have yielded 
useful results relate to this model. 

Frisch’s concern over multicollinearity may have been justified by 
the laxity of correlation practice in the 1920’s and early 1930’s. But 
Frisch’s own work has helped to make investigators more wary. 
Schultz [18] experimented with Frisch’s technique and concluded that 
it had no advantages over the combined least-squares and graphic 
methods used in his own work. Also, as Haavelmo has pointed out 
[30], few if any economic structures logically imply multicollinearity. 
Accidental multicollinearity (or an approach to it) in economic data 
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most frequently arises from major cyclical swings in prices, income, 
production and consumption. Intercorrelations resulting from this 
cause may be greatly reduced by shifting to first differences. 

Usually the choice of explanatory variables is severely restricted by 
economic reasoning. Admissible series to represent a given factor will 
generally be closely correlated with one another. If various possible 
demand indicators, for example, are highly correlated, I suspect that 
little is to be gained by trying all of them to see which one most nearly 
closes a set. The stability of net regression coefficients as new variables 
are added may be observed without recourse to a complete tilling. 

The importance of serial correlation may also be overrated in the 
technical literature. Serial correlation in the original variables is not 
disturbing. If an explanatory variable goes through a major cycle, the 
“dependent” variable must also contain the net effects of this move- 
ment. Serial correlation in the residuals is a danger signal, but it can 
usually be reduced by transforming the original variables to first 
differences. 

V 


I should like to turn now to some suggestions for applied work. First, 
concerning the choice of method: 
The applicability of different methods depends on the mechanism 


by which the observed data were generated. The Cowles Commission 
addresses itself to the general case, in which the data are generated by 
systems of relationships that are “stochastic, dynamic and simul- 
taneous” [25]. If it is clear that two or more simultaneous relation- 
ships are involved, it may be necessary to use the Cowles Commission 
method of fitting. However, a priori information on the parameters 
and shifts of one or more of the relationships may enable us to derive 
the other relationships by fitting one equation at a time. One situation 
which is mathematically trivial but which justifies many of the single 
equation analyses for farm products is that in which the simultaneous 
supply curve is a vertical line. 

In some cases where a multiple equation approach is theoretically 
indicated, the increase in accuracy over single equation methods may be 
slight. For example, I doubt that a system explaining the demand for 
one commodity need include an equation expressing the generation of 
consumer income. With reference to a micro-system centering on the 
supply and demand curves for potatoes, disposable income might as 
well be treated as an exogenous variable. As a rule, I believe that 
equations explaining the margin between farm and retail prices may be 
fitted independently of equations relating retail prices to consumption 
and disposable income. 
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Explicitly, my suggestions on choice of method boil down to the fol- 
lowing. 

(1) Describe, by means of diagrams and mathematical symbols, the 
structure of the demand-supply system which appears to be operative; 

(2) Determine whether particular relations in this structure are 
logically (i.e., mathematically) identifiable, or if certain modifications, 
such as choosing a time unit too short to permit “current” supply to 
respond to “current” price, will render them so; 

(3) Decide whether the identifiable relations can be approximated 
by a single equation method of fitting, taking account of a priori in- 
formation ; 

(4) If this is not possible, decide whether the data are adequate in 
concept and numerical accuracy to justify simultaneous fitting of the 
entire system of relations, either directly or by the method of reduced 
forms. 

I have already noted my reservations about “bunch analysis.” In 
agricultural price analysis today we rarely try to use more than three 
explanatory variables; often only two. High intercorrelations are 
readily detected in such small systems, and are not even very common 
if the data are expressed in first differences. The danger of multi- 
collinearity if we introduce two measures of consumer income or two 
price series for the same commodity is so obvious that we choose either 
one series or the other (or a linear combination of the two—i.e., an 
index number) at the beginning of the analysis. 

Where the single equation approach is appropriate, I think greater 
attention should be given to the effect of errors in all variables. There 
are some rather simple techniques by which our knowledge of either 
absolute or relative errors in the variables can be taken into account. 
Logically, a relation between the current retail price and current con- 
sumption of a commodity should be reversible. If the variables in- 
cluded in an analysis form a complete system, the explanation of one 
variable in terms of the others should be nearly perfect in the absence 
of errors and “shocks.” In such a case, a rough allowance for errors in 
all variables will reduce the spread between the elementary regression 
lines. The level of error in each variable should, where possible, be 
estimated from information about its method of collection and con- 
struction. Tintner’s variate difference approach [31] might be tried as 
either a supplement or an alternative to such information. 

In closing, I should like to enter a plea for improved communication 
between econometricians of Cowles Commission caliber and applied 
economists working on particular commodities and industries. Most 
American economists are unable to follow expositions in matrix nota- 
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tion, particularly when the elements of the matrices are first or second 
order partial derivatives. I think most of us can follow through simple 
illustrations in terms of two or three endogenous variables if the full 
set of calculations is given. The presentation of simple illustrative 
material is a pedestrian job, but it is a concession which every success- 
ful teacher makes to beginning students. Wider understanding of the 
technique will mean more concrete experiments with it in different 
economic fields. Some applications will prove trivial, but others will 
raise important problems for the econometrician to solve. 

Another important task, in my opinion, is the classification of actual 
commodities and industries according to “structural” types. For 
example, agricultural commodities might be grouped into types such as 
(1) perishable commodity, one major outlet, current production and 
sales not influenced by current price; (2) storable commodity, two or 
more independent markets, current sales influenced by current price 
and anticipations of future price; and so forth. The number of distinct 
types in a field such as agriculture will not be unduly large. I believe 
the Cowles Commission could perform a real service to applied econo- 
mists by working out a few detailed examples of the analysis of different 
types of structures found in practice. 
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ACTUARIAL SCIENCE—A SURVEY OF THEORETICAL 
DEVELOPMENTS* 


Cuar.es A. SPoERL 
Hina Life Insurance Company 


CTUARIAL SCIENCE, at least as the term is used in the English speak- 
A ing countries, differs radically from some other sciences in that 
great amounts of practical material are included. For this reason, a 
survey of the field, such as we customarily attempt at the turn of 
centuries and half centuries, is most effectively made in two articles, 
one on theory, one on practice. Conceive, if you please, that botany, 
horticulture, and greenhouse operation were all lumped together under 
the heading Plant Science, and you would have something of the same 
situation. 

Here we are concerned with theoretical developments and it is just 
here that we come up against the first problem. While it is true that no 
sharp line of demarcation exists between theory and practice, it is not 
too difficult to make a rough separation of the material on empirical 
grounds. This, however, does not provide a categorical answer to the 
question of just what, fundamentally, constitutes a theoretical de- 
velopment of Actuarial Science. Before attempting to answer this 
question, a brief review of the subject matter is useful in order to 
provide a point of departure. 

There is probably no better method of summarizing the content of 
actuarial theory than to list and comment on the subjects studied by 
beginners in the profession in the order that they are taken up in their 
course of study. After passing tests to assure himself of the adequacy 
of his formal mathematical equipment, which includes some training 
in finite differences and the theory of probability and mathematical 
statistics, the student at once plunges into the basic insurance mathe- 
matics generally known by the term “life contingencies.” This subject 
presupposes the existence of a mortality table, which might be thought 
of as sprung fully computed from the head of Jove; it has been defined 
by the famous English actuary, George King, as “the instrument by 
means of which are measured the probabilities of life and the probabil- 
ities of death.” From this table are derived the net premiums and 
reserves for the various forms of life insurance policies as well as the 
expectation of life and the solution to population problems. The stu- 
dent is then led into the intricacies of working with multiple decrement 
tables, which enable him to handle the effect of sickness, withdrawal, 
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etc., along with mortality. Finally, he must learn the mathematical 
methods of computing tables, given the basic figures. 

The remainder of the course has to do with the way mortality tables 
actually come into being—how they may be constructed from popula- 
tion data or from the records of insurance companies. Here there are 
two problems involved. First, there is the extraction of a set of mor- 
tality rates from the data. The resulting series of figures usually shows 
quite an irregular progression, which is at variance with our ideas of 
what a mortality table should be. The remaining task is to substitute 
for this sequence a smoothly progressing series without doing too much 
violence to what might be termed the “indications” of the observed 
series. The term “graduation” is applied to this process, and a great 
deal has been written on the subject both in actuarial journals and 
elsewhere. Naturally, both of these processes may be applied to other 
matters than mortality rates, for example, to sickness rates, accident 
rates, etc.; and graduation covers the smoothing of series arising in 
fields far removed from insurance and demography. 

In European countries, the university courses go on to more abstruse 
matters including the theory of risk and the use of advanced mathe- 
matical techniques. In the English speaking countries, where university 
courses do not usually progress beyond the elementary parts of the 
subject and the actuarial student is pursuing his studies “on the job,” 
the advanced topics are left to individuals with a special bent. 


WHAT ARE THEORETICAL DEVELOPMENTS? 


Here, then, is the field of actuarial science in very brief outline, stop- 
ping short of the practical parts of it. It is too much to expect develop- 
ments of a mathematical nature in life contingencies any more than one 
would look for new material in a course of elementary mechanics. 
George King put it this way in the preface to his text book written for 
the British Institute of Actuaries and published in 1887: “This volume, 
from the nature of the case, includes but little that is actually new in 
the way of investigation.” Of course, the arithmetic of the newer 
policies and of more recent valuation methods has had to be worked 
out, but this has presented few difficulties. The foundations of the sub- 
ject are another matter. Since the theory of probability has been 
revolutionized, it has become imperative to reexamine in the light of 
the new techniques of this theory all the results flowing from the basic 
assumption of as ancient a piece of baggage as the mortality table. 

Returning now to the question proposed—what constitutes a theo- 
retical development—it becomes evident that there are two general 
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fields which can produce such developments. The first is purely mathe- 
matical—the creation of mathematical processes to implement actu- 
arial ideas as well as the application of existing methods which have not 
been used in the actuarial field but have proved their worth elsewhere. 
The other is the examination of the axioms and fundamentals of the 
actuarial processes so as to be assured that the real meaning of what 
actuaries are doing is what they believe it to be, or, failing this, to 
establish the limits wherein the traditional actuarial formulas are valid. 
Much more time has been spent on the first type of development. As 
the subject is mathematical, almost algebraic, in nature, the results 
can be described and evaluated with precision, a procedure which is 
impossible in the other field. What better reason could one want for 
making a start with the strictly mathematical developments? These 
have been most extensive in the fields of graduation and interpolation. 


MATHEMATICAL DEVELOPMENTS 


At the turn of the century, which is a good reference point for a mid- 
century survey, there were, in addition to procedures based on in- 
terpolation, three methods of graduating mortality tables in general 
use. These three methods will be described first, saving the interpola- 
tion methods until later. 


Early Graduation Methods 


The basic method of graduation is the free-hand drawing of a curve 
among points representing the observed rates of death, charted ac- 
cording to age. This graphic method is of course still in use and pre- 
sumably always will be. The only significant improvement that has 
been devised is the addition above and below each charted point of two 
points distant by some convenient multiple of the standard deviation 
of the ordinate, considered as a frequency variate. These points serve 
as a guide to the reliability of the observations and to how much de- 
parture from them can reasonably be ascribed to chance fluctuations. 
For example, if the “convenient multiple” is 3, one would expect about 
half of the graduated points to fall within the intervals formed by the 
pairs of guide points. 

Another popular method of graduation is peculiar to life insurance. 
It depends on Makeham’s Law, which states that in many mortality 
experiences the force of mortality, u, at age z, is made up of a constant 
plus a geometric progression, viz., 


Mz = A + Be*, 


where A, B and ¢ are constants to be determined. Although most 
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mortality experiences exhibit some deviation from this type of rela- 
tionship, a certain amount of variation has been condoned because a 
mortality law of this type results in a material simplification of the 
arithmetic of joint-life premiums and reserves, amounting, in the most 
important field, to the use of a single entry table in place of a whole 
series of double entry tables. 

The problem of finding the most satisfactory methods of fitting a 
curve of this type to the data was a major concern of actuaries at the 
turn of the century. Once the exponential constant c was known, the 
others could be determined by the method of least squares. Various 
methods of finding c were explored; oddly enough, it was not until the 
mid-thirties that anyone had enough wit to graph yu, minus various 
constants on logarithmic paper and determine from the graphs the 
geometric progression, i.e., the curve nearest to a straight line. Recent 
years have produced methods of modifying the constants in different 
parts of the series in such a manner as to fit the unadjusted data more 
closely and yet preserve the arithmetical simplifications referred to. 

In the third of the classical graduation methods, the adjusted or 
graduated term of a series is a linear compound of a fixed number of 
unadjusted terms among which it is centrally located. A symmetrical 
series of weights constitutes the array of coefficients of the linear com- 
pound; both the weights and the number of terms involved may be 
varied to produce a whole family of formulas. Some graduation for- 
mulas achieve smoothness by riding roughshod over the irregularities 
of the data; others are relatively faithful to the progression of the data 
at the expense of smoothness. The more terms in the linear compound, 
the smoother the graduation formula can be made. There is no best 
formula; one that will produce fine results when used on one series of 
data may show poor ones on another. 

Originally, the only formulas of this family which were considered 
practical were the ones which involved simple arithmetical operations. 
These were known as “summation formulas,” since most of the arith- 
metic consisted of the summation of groups of terms, somewhat like 
a complicated variety of moving average. Since the general adoption of 
calculating machines this limitation is no longer necessary. The ques- 
tion of which came first, the machine or the machine method is a 
fascinating one. The linear-compound graduation formula which 
produces the smoothest results when judged by the reduction of error 
in the third differences has been discovered independently no less than 
four times, in 1871, 1915, 1916, and 1918. DeForest’s original discovery 
is obviously pre-machine, and Larus’s in 1918 definitely post-machine, 
since it was accompanied by a detailed description as to how to do the 
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work on a machine. The other two are difficult to judge. Perhaps what 
happens at an intermediate time is that the machine is getting just 
common enough to promote rather than rule out an investigation which 
appears to involve considerable arithmetical detail. 

The difference-equation method of graduation was developed during 
such a transition period. Whittaker’s original paper was read in 1919 
in Scotland, where there is a dearth of calculating machines. The 
practical application dates from 1924 when Henderson’s first paper 
was published in this country, where there are a great many of them. 
Approximations to the practical results of the new method but using 
the summation technique appeared in Scotland in 1926, a curious 
procedure which was evidently designed to secure the advantage of 
the method while bypassing the arithmetic. But enough of the phi- 
losophy of the machine! A really new method of graduation calls for a 
short description even in a brief survey like this, particularly since this 
one gets back to fundamental principles. 


Difference-Equation Graduation 


The point of departure of the new method is to establish measures of 
the opposing forces of graduation: smoothness and closeness of fit. 
Assuming that adequate measures can be found, it is possible to solve 
the graduation problem by maximum and minimum methods: by 
finding the smoothest graduated series compatible with a certain de- 
gree of closeness of fit. The usual measure of smoothness, S, is the sum 
of the squares of some order of differences of the n unknown graduated 
values: U1, U2, -* +, Un. Thus if third differences are used, 


n—3 


S = 1» (Uz+3 — 3Uz+2 + 3Uz41 = Uz)*. 
z=1 
The customary measure of closeness of fit, F, is the sum of the squares 
of the departures of the n graduated values, u;, from the n ungraduated 
values, v;, sometimes weighted as in least-squares formulations. Thus 
if w; are the weights, 


F= > Wz(Uz — Vz)?. 


Now, if ¢ is a relative measure of the compromise between smooth- 
ness and closeness of fit, the expression to be made a minimum is 
S+eF, and the necessary conditions for a solution come from setting 
the n partial derivatives of this expression with respect to each of the 
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n unknowns equal to zero. The result of this differentiation is a set of 
n linear equations in the n unknown graduated values. Such a set 
may be solved by machine methods but it is still a long job. If n is 
large, and it is often between 50 and 100, hand methods are completely 
inadequate. 

If the variation in the weights of the different observations may be 
ignored, a marked simplification results. Most of the linear equations 
then follow the same pattern, and thus may be represented by a linear 
difference equation with constant coefficients. This equation proves to 
be factorable into two of lower degree, and the arithmetical solution is 
thereby greatly facilitated. 

During the past twenty years the mechanics of the new method have 
been worked out in detail. There has also been considerable study of 
the theory. A machine has even been constructed, of springs and wires, 
which will make a graduation by this method in several special cases. 
Moreover, one particular case, it turns out, involves such simple arith- 
metic that the graduation can be performed comfortably without a 
machine. 

The only other new graduation method developed in recent years is a 
Scandinavian invention which depends on building up the graduated 
series from an array of second derivatives. This method depends to a 
great extent on the skill of the man making the graduation, and two 


independent graduations of the same data might result in a wide varia- 
tion. In this respect it stands at the other extreme from the Whittaker- 
Henderson process, which, once the end conditions have been set and 
the S, F and « criteria established, rolls on to an invariable result. 


Osculatory Interpolation 


The methods of graduation just described generally involve the sub- 
stitution of a smooth series of values for an unadjusted series of the 
same length. It frequently happens in actuarial work that the unad- 
justed series consists of, say, quinquennial values. In these cases, what 
is required is a series of interpolated values—four to each interval. The 
usual interpolation methods of the calculus of finite differences were 
originally developed for use with smooth analytic functions—such as 
logarithmic or trigonometric tables. When they are applied to more 
irregular data, the results are often not as smooth as could be desired, 
because the arcs that span the intervals between plotted data do not 
join evenly with one another, but have different tangents at the junc- 
tion points. To procure a style of interpolation that came closer to 
fitting actuarial needs, Sprague, back in 1888, constrained the in- 
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terpolated arcs to join each other at a fixed angle and a specified 
curvature based on the conventional curves, and the process became 
known as osculatory interpolation. At the beginning of the century, 
Karup produced a simpler formula by omitting the matching curvature 
requirement. Work in this field has been active during succeeding 
years. The use of specified tangents and curvatures was abandoned, 
producing more flexible results. We find Henderson writing, in 1924, 
“I prefer, however, to look upon osculatory interpolation as an entirely 
self-supporting operation not depending in any way on the theorems 
of ordinary interpolation. The successive intervals are filled in by 
curves of the specified degree but the constants are not determined by 
equating the differential coefficients at the points of junction to those 
determined from the usual finite-difference interpolation formula. They 
are simply determined so that the coefficients in the two curves meeting 
at that point shall be equal to one another. In this way it is not neces- 
sary to use a curve of so high a degree in order to secure osculation of a 
given order.” 

A further modification of the theory was due to Jenkins, who pro- 
duced interpolating curves in 1926 which although preserving tangency 
and curvature at junction points failed to duplicate the original values 
at these points. In this way, a certain amount of graduation could be 
combined with the interpolation process. This novel procedure was 


christened “modified osculatory interpolation.” The general theory 
covering both types was expounded by Greville in 1944 in such detail 
as to sum up the subject definitely and probably give future research 
in this field the character of a peroration. 


Other Interpolation Methods 


During the past ten years, as the gaps in the theory of osculatory 
interpolation were being filled in, students found that curves produced 
by this method often exhibited a sort of groundswell, from interval to 
interval, of such magnitude as to be objectionable. Other methods of 
interpolation were sought. One line of investigation was to use the 
Whittaker-Henderson approach and require the interpolated values 
to be the set with minimum differences of a specified order, as meas- 
ured by the sum of their squares. This led to a difference equation and 
produced interpolated values each contributed to by all the given 
values. Greville’s elaborate description of this technique appeared in 
Brazil, in Portuguese, in 1946. 

Meanwhile Beers accomplished something of the same thing without 
resorting to the rather cumbersome difference-equation technique. He 
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sought a similar set of interpolated values but one in which each would 
be a linear compound of just the nearest six of the given values. It 
turned out that this problem could not be solved without making some 
assumptions in regard to what might be termed the degree of hap- 
hazardness to be expected in the series of differences of the order 
minimized, but the resulting formulas gave eminently satisfactory 
results. ‘ 

Both of these methods have been adapted to the production of series 
of interpolated values of the “modified” type, which does not duplicate 
the given values and thus involves an element of graduation. 

Recently there have been some other interesting theoretical in- 
vestigations. Aubrey White has developed a finite-difference analogue 
of osculatory interpolation based on the equality of certain orders of 
sub-differences, rather than derivatives, at the junction points. Several 
writers have traced the connection between interpolation formulas and 
the summation formulas of graduation. There have been some notable 
contributions to ordinary interpolation: we have Aitken’s elegant 
process of interpolation by “cross-means” and the so-called “throw- 
back” device of E. W. Brown, rediscovered by Camp and Comrie in 
1928. Since this is to be a brief survey, we may not dwell further on this 
topic or describe these interesting developments, but this is as good a 
place as any to refer to the available actuarial literature. A complete 
bibliography is out of the question as an appendage to this survey: the 
tail would be vastly bigger than the dog and imperil its equilibrium. 
Nearly all of the original articles may be found in the actuarial journals, 
principally the Transactions and the Record of the two American 
societies now merged into the Society of Actuaries; the Journal of 
the British Institute, and the Transactions of the Scottish Faculty. 
Developments in graduation and interpolation up to about 1940 are 
conveniently listed in Wolfenden’s book on mathematical statistics; 
they are brought down to date in Greville’s paper in the September 
1948 issue of the Journal of the American Statistical Association. A 
useful elementary work is Morton Miller’s Elements of Graduation, 
published by the Society of Actuaries. 


Miscellaneous Matters 


Prominent in the American actuarial literature of recent years is a 
series of papers on the derivation of mortality rates from the records 
of life insurance companies. The arithmetic of multiple decrement 
tables has been elaborated; the necessary approximations arising from 
the use of the force rather than the rate of mortality have been worked 





342 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1951 


out in detail. All in all, these investigations are practical rather than 
theoretical in nature and will not be reviewed here. In the collateral] 
field of demography, great strides have been made in the technique 
of deriving the many abbreviated tables essential to the adequate por- 
trayal of the information buried in census data. 

An innovation that has been introduced into certain recent mortality 
tables is an allowance for secular changes in the basic mortality rates. 
For many years there has been a steady improvement in mortality, 
which has cut down profit margins on annuities to a vanishing point. 
This has progressed at so rapid a rate that a mortality table has a 
fair chance of being obsolete before its publication date. To counteract 
this tendency, methods have been devised by which a projected in- 
crease in longevity can be “built in” to the mortality tables. Something 
of the sort was done in Britain in 1925; in 1949 Jenkins and Lew devel- 
oped the theory in great detail in this country. 

The concepts of mathematical statistics have been very sparingly 
applied to actuarial matters. This may seem surprising but it is true, 
particularly in this country. The educational program is putting more 
and more emphasis on mathematical statistics and it is to be expected 
that future developments will remedy this neglect. The x? test of 
goodness of fit has from time to time cropped up in graduation, but it 
has never won general acceptance. The standard deviations of actuarial 
functions, commonplace in Europe, have been used in America only 
for about 25 years. Two problems in this connection still remain to be 
investigated. The first is how the mortality rate varies at a given age, 
that is, what is its frequency distribution? A satisfactory answer is not 
available because adequate data have never been tabulated in the 
necessary form and detail. The other problem arises because of the 
way life insurance companies make mortality investigations “by 
policies” and “by amounts” rather than “by lives.” Since one “life,” 
that is, one insured person, may have several policies, the variance 
of a mortality rate computed from “policies terminated by death” 
divided by “total policies,” or from “actual death claims” divided by 
“insurance in force,” will be vastly different from the variance of one 
computed in the basic manner. What is needed is a simple technique 
for developing conversion factors. 

Finally, among the branches of mathematics which have been 
described in some detail in actuarial journals and considered as immi- 
nent or prospective tools for actuaries by their proponents are Fourier 
series, Stieltjes integrals, Tchebychef polynomials, binary calculation, 
Boolean algebra, and integral equations. There has even been an 
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attempt to explain mortality rates along the lines of the quantum 
theory. Most of these papers were written by Englishmen, and are 
perhaps a mark of that eccentricity which Henry Adams ascribed to 
Englishmen when he visited England in 1863. Still, when the next 
survey is made, some of these ideas doubtless, will prove to have 
fallen on fertile soil. 


THE FUNDAMENTAL CONCEPTS 


So much for a brief account of the improvements in mathematical 
techniques. There remains the examination of the fundamentals. What 
does actuarial science really deal with? Here we have a-dearth of 
scientific papers, particularly in this country. There have been a few 
in Britain. Continental actuaries have devoted a great many words 
to the subject, but it cannot be said that they have shed much light 
on it. They have done a thorough job of working through classical 
life contingencies in the light of modern mathematical statistics, 
replacing probabilities by frequency distributions and expectations. 
Beyond this, their work has been largely academic, reflecting the 
university approach and the Ph.D. thesis rather than a basic endeavor 
to elucidate the fundamentals. 

The fact is that very little has been done to elevate actuarial science 
to the level of a true science. F. M. Redington recognized this and 
summed up the state of affairs in discussing a paper before the British 
Institute in 1944. He believed that, although there had been consider- 
able improvements in actuarial technique and craftsmanship during 
the last 50 years, there had been little or no real scientific advance. 
He did not believe, however, that the actuary’s subject was entirely 
devoid of scientific content, or that actuaries were entirely devoid of 
scientific ability or curiosity. An examination of the Journal showed 
that almost all their scientific zest and ability were directed towards, 
and indeed dissipated in, theories of graduation and mortality, and 
it was clearly in that field that the obstacle must be sought. Concep- 
tions such as ‘underlying true rates of mortality,’ ‘standard deviation 
of deaths,’ and even ‘probability’ itself, were in the nature of postu- 
lates rather than facts. It was with those shadowy but formidable 
postulates, rather than with the facts, that they were wrestling. 

Although this appraisal was in the nature of a by-blow—the paper 
under discussion dealt with the validity of statistical tests of mortality 
tables— the uncomfortable truths are most pertinent. Many of the 
fields in which actuaries ‘dissipated’ their energies are no longer fertile. 
The preoccupation with mathematical mortality laws, so fashionable 
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at the beginning of the century, has to a large degree abated. More- 
over, it is now quite generally recognized that these attempts to 
approximate mortality rates are an exercise in curve fitting rather than 
an attempt to phrase the underlying philosophy of mortality in mathe- 
matical terms. The theory of graduation and interpolation has probably 
reached its limit, in principle at least, with the development of the 
maximum and minimum technique. The earlier improvements were 
empirical in nature; now any problem is solved by standard mathe- 
matical methods once the objects to be accomplished have been formu- 
lated in mathematical terms. 

The ‘underlying true rates of mortality’ referred to still present 
obstacles. As medicine has progressed both in its ability to understand 
and prevent disease, our concepts of mortality have been progressively 
changing. The notion of a single probability of death applicable to a 
specific age has become meaningless. Whether any part of it can be 
salvaged by working with several probabilities applicable to different 
degrees of impairment and to exposure to accidents remains to be seen. 
As new drugs are developed, yesterday’s impairments become greatly 
modified in nature and incidence; as new methods of mass destruction 
are devised, tomorrow’s deaths become more and more unpredictable. 

So far, relatively crude procedures involving statistical frequencies 
and distributions are about the best that can be used. The stability 
of mortality phenomena, over short periods at least, has made these 
methods work. Note has been taken of the lack of adequate scientific 
studies of the statistical distribution of the mortality rate at each age. 
Investigations along these lines could well go hand in hand with the 
study of mortality from a medical point of view. 

Once we know what mortality is all about and how it works at the 
grass-root level, we shall find no lack of mathematical techniques 
ready to hand for the elaboration of an actuarial science. Until the 
foundations are put on a scientific basis, future explorations can 
hardly yield more than small stones for the superstructure of the 
actuarial edifice, and actuarial science will develop more in the direc- 
tion of an art than a science. 





NATIONAL INCOME: STATUS AND PROSPECTS AS SEEN 
BY AN ESTIMATOR* 


GEorGE JaszI 
National Income Division, U. S. Department of Commerce 


RECENT TRANSFORMATION OF NATIONAL INCOME STATISTICS 


ATIONAL income statistics underwent a basic transformation 

during the past two decades, and particularly during the last. I 
shall start with a summary of what I think is the nature of this trans- 
formation in order to facilitate the evaluation of their prospects. 


From measurement of output to depiction of economic process 


The essence of the change, as I see it, was a shift of emphasis from 
measuring total national output to providing a statistical picture of 
the economic process and structure. The magnitude of the transforma- 
tion will be clear if you compare national income data that were 
available, say, before the depression of the thirties, with those prepared 
since the beginning of World War II. The former were in general 
focused on the construction of totals of national income and product— 
that is, of measures of output. The latter present to an increasing 
extent a picture of the economy. 

This picture is constructed by dividing the economy into several 
transactor groups—most frequently, business, consumers, government, 
and foreign nations—and summarizing the transactions of these groups 
with each other by a set of economic accounts. These usually consist 
of separate current accounts for each of the transactor groups and of 
a consolidated saving and investment account. National income and 
product—formerly the sole results of the calculations—are now em- 
bedded, as it were, in this system of accounts. They are presented in 
an additional summary account—the national income and product 
account—in which they are derived by an appropriate combination 
of transactions included in the current and saving and investment 
accounts. 

Needless to say, I do not mean to imply that prior to the recent 
change national income statistics were regarded solely as measures of 
total output and that they were not used to throw light on the eco- 
nomic process and structure. The components from which national 
income and product are aggregated represent the bulk of the informa- 
tion on the nature of the economy which the statistics provide; and the 
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importance of this information has been recognized all along. What I 
do mean is that no systematic attempt was made to utilize the data 
fully in this respect. 

Similarly, the recent transformation does not imply a lack of interest 
in the measurement of output. It is obvious that such a measurement 
must be the very heart of a significant description of the economy. All 
that is meant is that output is now placed against a background show- 
ing how it is produced, distributed, and used. 


Contribution to measurement of output 


I see two major points of significance in this change. The less impor- 
tant one relates to the perennial discussion regarding the proper defi- 
nition of national output. 

It is true that the establishment of a national economic accounting 
system for depicting the economic process does not supply genuinely 
new criteria for solving these definitional problems. Many definitions 
of national output are compatible with the principles on which this 
system is based. Yet systematic national economic accounting has 
helped the discussion in several ways. 

A great deal of it was obscured by a failure to distinguish clearly 
between the income and the product measurement of output and by 
the lack of a clear grasp of the relation of the two. The establishment 
of an accounting system has made it much harder to fall into this type 
of confusion. 

It has contributed to the solution of definitional problems also by 
depriving them of some of their importance. This is what I mean: A 
substantial part of the discussion revolves around the question whether 
certain items should orshould not beincluded in the total—for instance, 
government interest, business taxes, transfer payments, subsidies, 
government services, etc. Prior to the recent transformation a decision 
to omit such moot items from output meant that their record, as far 
as national income statistics are concerned, was lost. Since these items 
tended to be germane to economic investigations in which national 
income data are used, there was often reluctance to exclude them, 
even though that was indicated from the standpoint of measuring 
output. With national income statistics transformed into a statistical 
picture, the record of excluded items is no longer lost, and the problem 
of defining output in the best possible manner can be faced solely on 
its own merit. 

Finally, to the extent that fully satisfactory solutions to the defi- 
nitional problems cannot be found,—and I believe this to be the case 
in many instances—it has been made easier to live with them. Many 
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of the controversial items are separable in the statistical picture, and 
alternative measures of output can be constructed depending on one’s 
tastes. 


Improved understanding of economic mechanism 


The second, and more important, manner in which the change 
has been significant, is by furthering our understanding of the economic 
mechanism. It has done so in two distinct ways. It has thrown into 
clear relief the structure of accounting relations that must always 
hold true among the transactions that are the components of the statis- 
tical picture. The sense in which saving and investment are necessarily 
equal is, of course, the prime example of such a relationship. A great 
deal of the complex and confusing saving and investment discussion 
of the thirties could have been avoided, had the transformation of 
national income statistics been an accomplished fact in that decade. 

Light has also been thrown on the relative magnitudes of the com- 
ponent flows of the economic process, and the study of the statistical 
relationships among them has been facilitated. In contrast to the ac- 
counting relationships, which are a matter of definition and must 
always hold, these statistical relationships are regularities that hold 
by and large, as a matter of experience, but which can and do change 
in response to technological, institutional, and psychological changes. 
I have in mind, for instance, the relation between consumption and 
disposable income, wages and profits, taxes and tax bases, etc. 


Causes of transformation: economic crises and war 


The causes of the transformation which I have reviewed are not far 
to seek. They are Economic Crisis and War. The great depression of 
the thirties and—for very different reasons—World War II made it 
obvious that complete reliance could not be placed on the automatic 
working of the economic mechanism, and that conscious steps had to 
be taken to ensure its satisfactory functioning. A prerequisite for this 
was a better understanding of its operation. The transformation of 
national income statistics into a statistical picture of the economic 
process and structure was in direct response to these needs. 


FUTURE WORK ON NATIONAL INCOME 


I turn to a discussion of the future. I believe that work and interest 
in national income will remain active and expanding. The probable 
indefinite continuation of economic strain and stress, and of the 
danger of war will ensure that. 

However, I do not envisage a further transformation comparable to 
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the one we have witnessed. It seems likely that our economic problems 
will be similar to those which we confronted during the thirties and 
forties. And even considerably changed methods of dealing with them—- 
say in the direction of further interference with the automatic working 
of the market mechanism—would continue to call for data basically 
in the same form as presented now. Accordingly, I see further work 
as involving largely refinement and elaboration along lines implicit 
in recent developments. 

The improvements called for may conveniently be grouped under 
three headings: First, the definition of output; second, the statistical 
picture of the economic process and structure, other than the definition 
of output; and, third, the accuracy of the estimates. 


Gauging statistical accuracy 


I shall take up the last point first. It is obviously an important 
point. If the statistics are to help us to understand the performance 
of the economic system, they should be made as accurate as possible. 
Furthermore, in the absence of complete accuracy, the users of the 
data should be enabled to form a judgment about the magnitude of the 
error attaching to the estimates, and be given guidance in the proper 
use of estimates that are subject to such error. 

In the past two decades we have made tremendous strides in im- 
proving the accuracy of national income estimates. That we have 
genuinely consolidated positions which we were holding only tenuously 
in the past, is somewhat obscured by the fact that simultaneously we 
have pushed forward into new territories and that we are again en- 
camped along border lines which we do not firmly hold. I am confident 
that we shall continue to make satisfactory progress in improving the 
reliability of the types of estimates that now exist, and I do not see 
that any radical new plan has to be formulated in this connection. 
Perseverance is what is mainly involved. 

However, we are again likely to extend ourselves further and to 
make estimates not previously attempted. The reliability of these 
new estimates will again be low. And this is what makes the second 
part of our task so essential: to provide users ‘of the statistics with some 
notion of their error and a guide of how these imperfect statistics 
should be used. I realize that in this respect we have done hardly 
anything at all, and I feel very strongly that work should be done in 
this field. 

But it seems difficult to offer concrete suggestions. While some of 
the work may have to utilize complex tools of mathematical statistics, 
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it seems to me that some thought on how statistical errors affect the 
type of analysis in which national income estimates are used—as con- 
trasted with the role of errors of measurement in the natural sciences— 
would be helpful as a first step. 

I, for one, feel acutely the need for this type of discussion, because 
I find myself developing a split personality when I think of these 
problems. On the one hand, I am acutely aware of the wide error to 
which some of our most strategic estimates are subject. This error 
appears to be far wider than would be tolerable in the natural sciences. 
On the other hand, I am willing to assert with perfect good faith that 
our statistics have contributed a great deal to our understanding of 
the economy, and that instances in which statistical inaccuracies have 
led to erroneous diagnoses have been comparatively rare. On the face 
of it these two positions seem incompatible. Yet I feel that in some 
sense both are correct. Perhaps a partial answer lies in the fact that 
there is a relatively large number of propositions in the realm of eco- 
nomics that are significant yet so blunt that they can be tested by 
means of data that are subject to a wide margin of error. Perhaps some 
thought on the role of quantitative data in arriving at economic diag- 
noses, such as might be suggested by these considerations, should form 
a background to more technical explorations in this field. 


Elaboration of statistical picture 


I turn to improvements in the statistical picture of the economic 
process and structure, other than those relating to the definition of 
output. 


Constant dollar accounts 


Perhaps the top requirement under this heading is the supplementa- 
tion of the current dollar estimates by estimates in terms of constant 
dollars. What ideally suggests itself is a complete statistical picture 
in terms of constant dollars paralleling the current dollar figures. Some 
thought has been given to deflation on such an ambitious scale, but I 
do not believe that there is prospect for success. 

I do believe that a useful deflation of the national output can be 
undertaken. In this deflation output can be valued either at constant 
market prices or at constant factor costs. Useful calculations of this 
nature have been made in the past. The National Income Division_of 
the Department of Commerce has recently made a study of this type." 





1 The results of the study were subsequently published in the Survey of Current Business for 
January 1951. 
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I am much less hopeful about our ability to arrive at another con- 
cept which is sometimes regarded as the aim of deflation—namely a 
measure of the volume of the input of factor services. In the case of 
deflation of outputs we can, in the majority of cases, see our target 
tolerably clearly: the physical units behind the current dollar figures, 
For the deflation of factor input no similar workable notion of the 
quantity of factor input exists. To give an extreme illustration, which 
admittedly minimizes the difficulties of deflating output and maximizes 
those of deflating factor input, consider the differential difficulty of 
visualizing the physical units to which, (a) current dollar series on 
bread production and, (b) current dollar series on corporate profits 
refer. 

I am not sure whether what I have just said constitutes an explana- 
tion of why a useful deflation of factor input is not feasible; or whether 
I have merely restated the problem. Would not a genuine explanation 
have to show how it has come about that we have not evolved a notion 
of factor input sufficiently concrete for statistical measurement, 
although we utilize the general notion of factor input constantly in 
economic thought? 

With respect to the deflation of flows other than national income 
and product—say transfer payments, taxes, saving, etc.—, I see little 
prospect of a general solution. In these cases a notion of physical 
quantity even as vague as the one available for factor input is lacking. 
Undoubtedly, procedures roughly appropriate for specific purposes 
may be found. In studies of disposable income and saving, for instance, 
in which it is desired to examine real relationships that abstract from 
the price factor, something is gained by deflating both magnitudes 
by the cost of living. But this and similar procedures are not wholly 
satisfactory even for the purposes at hand, and certainly do not give 
a clue for a systematic deflation procedure. 


Sector saving and investment accounts 


As regards elaborations of the current dollar picture, there is, first, 
the construction of saving and investment accounts for each of the 
major economic groups for which current accounts have been estab- 
lished. As I mentioned initially, only a consolidated saving and invest- 
ment account for the nation as a whole is available at the present time. 
In this consolidated account all financial saving and investment trans- 
actions have been cancelled out—only real investment and the match- 
ing flows of saving remain. Deconsolidation would bring to light these 
financial transactions and would be useful for purposes of monetary 
and financial analysis. 
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To be really useful for this purpose, however, deconsolidation would 
have to go further than the establishment of separate accounts for 
business, consumers, government, and foreign nations. At least one 
sub-group—financial institutions somehow defined—should be dis- 
tinguished within the business system. 

It may be noted, incidentally, that the deconsolidation of the saving 
and investment account provides a means of integrating national in- 
come statistics with the money flow studies initiated in recent years. 


Deconsolidation of business current account 


Second, the statistical picture as now presented shows the current 
transactions of the business system on a consolidated basis only. De- 
consolidation of the business current account would throw light on 
differences of structure within the business system and on interrela- 
tions among its various parts. It would be particularly useful in the 
study of the behavior of costs and prices. It may also provide a basis 
for integrating conventional national income with input-output statis- 
tics. 


Balance sheet accounts 


Next, a set of balance sheet accounts showing the structure of wealth 
automatically suggests itself. Undoubtedly these data, too, would be 
useful. For instance, a knowledge of the composition of financial 
assets and liabilities would be useful in formulating monetary and 
credit policies and also in the study of consumer and business economic 
behavior. There would also be interest in many types of real assets, 
for instance inventories. 

It is perhaps least easy to say precisely what the usefulness of 
balance sheet information on the stock of fixed capital would be. Not 
that there is no need for data on the stock of plant and equipment. 
On the contrary, this need is great in studies of capital accumulation, 
industrial capacity etc. However, balance sheet data on fixed capital 
would be difficult to interpret for these purposes. The book values in 
which they are expressed are of only limited use in explaining market 
behavior; nor do they throw much light on the underlying physical 
quantities, knowledge of which would be required in studies that 
have technological implications. 


Other elaborations . 


I have reviewed some of the major elaborations of the statistical 
picture as now presented which suggest themselves most naturally. 
Needless to say, they do not constitute an exhaustive list. For instance, 
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estimates of the size distribution of income can be regarded as first 
steps toward a further articulation of the consumer sector; and work 
on regional income estimates may be viewed as the elaboration of the 
statistical picture in yet another dimension. 


Danger of over-elaboration 


A danger of over-elaboration is involved in work along the lines 
set forth. Once the fruitful notion of a statistical picture of the economy 
presented by means of an economic accounting system has been con- 
ceived, there is a temptation to refine by pursuing the inherent logic 
of the initial vision, without asking the question whether from the 
standpoint of furthering economic understanding this is in fact the 
best expenditure of effort. No new vistas of understanding are opened, 
and sterility results. 

This type of situation is of course not peculiar to national income 
statistics. A similar rhythm—the initiation of a new vision and its 
subsequent excessive elaboration—is usual in all branches of thought. 
It is extremely unlikely that in our case it can be avoided completely. 
I do not think that we can expect soon again the thrill: which we experi- 
enced in the early forties when the elements of the old national income 
statistics grouped themselves into what was an essentially new view 
of the economy. We are probably destined to refine and to elaborate 
for some time until some new integration emerges. However, we should 
do our best not to carry mechanical elaboration and refinement to 
excess. 


Improving measurement of output 


Finally, I turn to possible improvements in the measurement of 
total output. 

While the unique feature of the recent transformation was the 
elaboration of a statistical picture of the economic process in the 
framework of which output is produced, distributed, and used, impor- 
tant work has also been done on the proper definition of output. I 
have already explained how the new approach to national income 
statistics has contributed to the solution of definitional problems. 
However, certain very basic problems remain unsolved. I want to 
comment on two of them. 

Viewing the national output as the sum of what is consumed and of 
what is added to the stock of capital, a great deal of thought has been 
devoted to the examination of the notions of “consumption” and “capi- 
tal formation,” and various proposals for an improved definition of 
these two basic categories have been made. 
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Improving measurement of capital formation 


As regards the latter item, it has been pointed out that generally 
speaking capital formation is recognized only for the business sector 
and that capital formation by consumers and by government is not 
adequately taken into account. Next, shortcomings in the notion of 
net capital formation—i.e. in the definition of capital consumption— 
have received increasing notice. While subtle problems involving the 
definition of capital maintenance under conditions of technical prog- 
ress are also involved, the major practical impetus to the recent discus- 
sion was the feeling that in times of rapid price change business ac- 
counting methods for the using up of fixed plant and equipment, as well 
as of inventories, are inadequate. Finally, the attempts to side step 
these issues by dealing with gross capital formation have recently led 
to a realization of the fact that further work on what constitutes the 
most useful definition of gross capital formation is needed. Questions 
of durability, and the treatment of repair and maintenance expendi- 
tures are involved. 

My feeling is that we know sufficiently what we are after for useful 
work to be done on these problems, although some of them are very 
intractable. I feel very different about the discussion which revolves 
around the definition of consumption, to which I now turn. 


Definition of consumption 


This discussion is based on the notion that “consumption” ought to 
measure all goods and services which people “enjoy” and that it should 
exclude all other goods and services. Perhaps the earliest dissatisfac- 
tion which investigators armed with this preconception felt was in con- 
nection with the problem of imputation. As is well known, national 
output as conventionally measured includes in addition to monetary 
items such imputed items as the value of wages and salaries paid in 
kind, the value of food produced and consumed on farms, and the 
rental value of owner-occupied dwellings. A convenient short-hand 
explanation of the inclusion of these items is the statement that they 
represent goods and services whose consumption is enjoyed by the 
members of the community. If this explanation is taken at its face 
value, the question arises why national income measurement should be 
limited to the imputed items which I have listed, and a few others, and 
why a host of other items which appear to be of a similar nature— 
housewives’ services, self-administered shaves, etc. etc.—should be 
excluded. Armed with the general notion that consumption should 
measure what is enjoyed, the investigator is off on the quest for a 
logical and systematic delimitation of imputed consumption. 
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Closely related to imputation is the problem of occupational expense, 
Here the position appeared to be that certain items were included in 
national output even though they did not reflect true enjoyment, but 
merely expense necessary to the earning of income. 

While these problems did not find an adequate answer and consti- 
tuted a chronic though minor irritant to national income thought, a 
second branch of discussion blossomed forth much more luxuriantly. | 
refer to the famous controversy about the intermediate output of 
government. Broadly speaking, this discussion took for granted that in 
general we were successful in including in our measure of consumpt) 1 
whatever we enjoy. But it was believed that in measuring the servic.s 
provided by the government we fell short of our usual standards of 
achievement. In this instance, we counted services which are not en- 
joyed. For instance, we counted services that are not enjoyed per se 
but are only instrumental in producing services that are enjoyed— 
government services to the business system—-; or services that are not 
enjoyed but constitute a necessary evil—military services. A parallel 
search was on for a list of only these government services that are en- 
joyed, from which intermediate services of government are excluded. 

In the course of this discussion a significant discovery was made. 
It emerged that the problem, first noticed in connnection with govern- 
ment services was not confined to them. For every type of government 
intermediate product that was isolated an analogue, now included in 
our measures of private consumption, could be found. The inclusion 
of “necessary evils” in government services was paralleled by the in- 
clusion of necessary evils in private consumption. Expenditures for 
burglar alarms, watch dogs, and body guards appeared to have the 
same role in private consumption as had defense expenditures in gov- 
ernment services. Analogues to other types of intermediate services 
could also be found. For instance, if a consumer buys oranges f.o.b. 
Florida and pays the Railway Express Agency to have them shipped 
to his home, the services of the latter, which are listed separately in 
private consumption, appear to be very similar in their relation to 
consumer satisfaction to government services provided to the business 
system which facilitate the flow of production from business to con- 
sumers. 

These discoveries led to the proposition that private consumption 
as well as government services be purged. Moreover, at this stage the 
intimate relation of this discussion to that of imputed income and 
occupational expense was realized; and added to the proposed purge 
was a proposal for a more systematic inclusion of imputed items. The 
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full proposals amount to extensive demolitions and new constructions 
on the national output front. Owing to disagreements among the 
various architects who have submitted specifications and owing to 
cloudiness in their plans, the new skyline of national output cannot be 
envisaged clearly. However, it might differ very substantially from the 
present one. 

I shall now try to state my own position in this controversy. It ap- 
pears to me that if further work is to be done along the suggested lines, 
the basic requirement is a searching and precise examination of what 
we want to and can mean by “consumption.” To clear the ground we 
shall have first to give up a notion which lurks behind the entire dis- 
cussion, and which to me seems entirely unwarranted. This notion is 
that we are now equipped with a workable concept of consumption 
along the lines of “something that yields enjoyment” which we can use 
in determining what items constitute consumption and what items do 
not. 

In fact, we do not use “enjoyment” as a criterion at all. We start 
with a general concep* of a final good as one that is not charged to 
current cost by business during the accounting period; as contrasted 
with an intermediate good as one that is so charged. We further dis- 
tinguish goods bought by business that are not charged to current cost. 
These are investment goods. The remaining final goods constitute con- 
sumption;—except that it may be possible to segregate from this 
total, and to assimilate with investment goods, a further category, by 
using some criterion based upon multiple use or durability. 

It is true that we do make marginal modifications in those definitions 
—imputations have been mentioned as examples—but these modifica- 
tions are strictly marginal. They are dictated by ad hoc reasons that 
are usually good ones in the light of the practical use that is made of 
the data. But they do not provide a tool for departing fundamentally 
from the basic manner according to which we now measure consump- 
tion. 

This is the first point to realize. If we desire to introduce funda- 
mental changes we have to set about constructing such a tool instead 
of plunging into ambitious lists of proposals for various exclusions 
from and inclusions in national output, with an almost complete lack 
of systematic discussion of what the notion of consumption is upon 
which these proposals are based. 

I do not know what such an attempt will reveal. But several events 
seem likely. First, I think we shall find that a not negligible part of 
the current proposals are based upon a desire to define output so as to 
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make it reflect directly changes in welfare under conditions in which 
needs have changed. If this were the fact, there is a fairly respectable 
body of economic theory to tell us that such attempts are no more prac- 
ticable than squaring the circle. 

For instance, the suggestion to exclude defense expenditures on the 
ground that they do not add to welfare can only mean that, as com- 
pared with a situation in which there was no military danger, a com- 
munity that is in such a danger and undertakes defense expenditures is 
no better off than before. This may be true. However, it is also ‘rue 
that if people got hungrier in general they will not be better off if the 
supply of food increases only pari passu. Yet few would suggest that 
under these conditions an increase of food production should not be 
counted as an increase in output. 

However, even if this source of confusion is eliminated, part of the 
malaise will remain. It springs from the desire to draw a more satisfac- 
tory line between economic production and non-economic pursuits 
than that afforded by the criterion whether the item in question in- 
volves a transaction in monetary form; and from the desire to distin- 
guish between intermediate products and final output by a method 
that is superior to the one which uses the criterion of whether the item 
in question is charged to current expense. 

There is real cause for dissatisfaction here. Items that are obviously 
similar from the standpoint of consumer welfare receive divergent treat- 
ment depending on whether or not they take monetary form and 
whether or not they are charged to current expense by business enter- 
prises. However, I am very doubtful whether something sweeping and 
systematic can be done to improve matters. I should not be surprised 
if we should have to accept the basic dividing lines which we now 
draw, and modify them only in instances in which they are particularly 
irksome for specific purposes. I should say that I exclude general wel- 
fare comparisons from what I call “specific purposes.” 

Finally, even if we lay aside the difficulties considered so far— 
changes in needs, and the problems of delimiting the area of economic 
welfare—we are left with some more. They spring from the fact that 
the consumption goods which a consumer uses constitute factors of 
production, as it were, that are combined by him to produce satisfac- 
tion. Hence our attempt to draw conclusions about changes in satis- 
faction by analyzing changes in lists of consumption goods acquired, 
is subject to the same type of difficulty as attempts would be to 
measure changes in the output of a firm via an analysis of its factor 
input.’ Obviously technological change will upset such calculations, 
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although there may be difficulties even in the absence of such change. 

I do not know whether there is a solution to the problems arising 
on this score. If the structure of wants were such that their satisfaction 
would be associated with the possession of a clearly defined set of con- 
sumption goods, whereas other consumption goods were wanted only 
because they are a sine qua non of the use of this set, an improvement 
over the present measurement of consumption might be found. The 
procedure would probably be to count as the physical volume of output 
only the commodities directly associated with the satisfaction of wants, 
weighting them with expenditure weights that include in addition to 
direct expenditures for them also the expenditures for the items that 
are a sine qua non for their use. 

For instance, in the example which I have used earlier, in which a 
consumer buys oranges f.o.b. Florida and pays separately for the 
services which the Railway Express provides in transporting them to 
his home, we would count only the oranges transported, weighting 
them with expenditure weights that include expenditures for the Rail- 
way Express. We would not count separately the volume of services— 
say, in miles of transportation provided—of the Railway Express, 
since a change in this volume that is not reflected in the oranges de- 
livered to the consumer, but is due, for instance, to a different routing 
of the consignment, would not be indicative of a change in the state of 
satisfaction of the consumer. 

I think it will appear upon further reflection that clear-cut instances 
of this type are few, and that even instances that appear to be clear-cut 
at first sight cease to be so upon further examination. In general, a 
separation of “want-fulfilling” and “sine qua non” commodities does 
not seem possible, and I cannot envisage a revamping of the notion 
of consumption along these lines. 

I conclude, therefore, that not much that is useful will result from 
pursuing in their present form the current proposals for a basic change 
in the delimitation of the area of consumption. However, on the posi- 
tive side it may be said that these proposals did indirectly have the 
beneficial effect of stimulating interest in the classification of govern- 
ment services, which still is a somewhat neglected field. Moreover, con- 
sideration of the types of problems to which attention has been drawn 
might lead to the quantitative study of features of our economy 
hitherto excluded from national income statistics—namely of the 
welfare aspects of the items now charged to current cost by the business 
system, in other words, of the conditions of work. This might be an 
interesting extension of conventional national income work. 





STATISTICAL MEASUREMENT AND ECONOMIC 
MOBILIZATION PLANNING* 
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DEQUATE information on resources-requirements balance sheets is 
A indispensable for top policy decision and its administration, both 
in peace and in war. Compilation of detailed balance sheets for use 
in mobilization is a great statistical undertaking. The purpose of this 
paper is to outline the kinds of data and methods used in mobilization 
planning, for the information of the statistician who suddenly finds 
himself thrown into some part of the job. 


PEACETIME PLANNING 


In peacetime when the country is carrying out what used to be 
thought of as “normal defense effort,” the military is nevertheless 
attempting to anticipate the problems of wartime. At such times, the 
problem is one of determining whether or not the strategic plans of the 
Joint Chiefs of Staff are within the country’s economic potential. To 
make such an evaluation, requirements for key resources such as 
manpower, steel, copper, aluminum, and lumber must be calculated 
and measured against potential supplies. This appraisal is made by 
using relatively crude techniques, without going into detailed type, 
size, shape, and form breakdowns. 

In the case of military requirements the task is handled by the 
Munitions Board working with the three Services; in the case of non- 
military requirements, by regular agencies of government. Common 
assumptions are supplied to the several agencies by the National 
Security Resources Board and the results integrated and appraised by 
that agency. 


PARTIAL MOBILIZATION 


In a period of partial mobilization such as the present, to assure 
that military and essential civilian production can obtain required 
resources when needed the government interferes with the normal func- 
tioning of the economic system. It must then make decisions regarding 
the extent to which industrial expansion should be encouraged and 
facilitated, the cuts to be made in scarce materials allowed for making 
civilian consumer products, the kinds of goods which should be pro- 
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duced in larger or in smaller quantities. Other decisions must be made 
regarding how production and distribution are to be altered with a 
minimum of government control, preferably indirect control. ; 

As the degree of partial mobilization grows, techniques used in war- 
time must be substituted. Peacetime “resources-requirements analysis” 
must be expanded to provide basic information for new policy decisions. 
Military requirements are calculated in greater detail. Analysis of 
government statistical records and trade association compilations is 
intensified. 


WARTIME OPERATION 


In wartime important resources such as steel, are allotted among 
specific programmed uses. However, such an allotment cannot be 
made in terms of steel, but steel of a specific shape, form, size, and 
composition, and in just the right quantity to meet a definite production 
schedule. To do this job, detailed records of industry must be used in 
place of general overall figures. This necessitates detailed calculations 
of needs by sub-contractors, which are passed on to prime contractors, 
consolidated and passed on to government as firm requirements. In 
government, all of the requirements for steel in all of its detail must 
be aggregated by specific programs, such as the several munitions 
categories, industrial expansion, foreign aid, mining, agriculture, etc., 
and then consolidated into overall totals. 

When this is done and the totals are measured against the prospec- 
tive supply, decisions must be made as to what programs are to be 
curtailed, and by how much. 


BALANCE SHEET COMPARISONS 


The adequacy or inadequacy of resources to meet requirements can 
be presented in a form currently termed “balance sheets”. These can 
also show the adjustments necessary to bring supply and demand into 
balance. 

For the composite programs, military and non-military, dollars and 
manpower are the only common units of measurement that can be 
used. For the production of specific goods, measurements are needed 
also in terms of materials, facilities, fuels, and electric power. In World 
War II, it was found that overall munitions programs could be analyzed 
effectively in terms of steel, copper, and aluminum. For some particular 
programs, tests of adequacy of resources had to be made in terms of 
plant capacity, or supplies of key components, at least over a short 
period; for others, in terms of rubber, or lumber, or some other less 
generally used material. 
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ROUGH MEASUREMENT OF RESOURCES AND REQUIREMENTS PRIOR TO 
ACTUAL MOBILIZATION 


Detailed measurement of requirements and resources, covering 
thousands of types of goods and services and expressed both in dollars 
and in amounts of needed materials and other resources, is an expensive, 
elaborate process, and not necessary until supplies are expected to be- 
come critically short. 

For long-run policy decisions of mobilization planning—and even, 
to some extent, wartime operations—military programs and the related 
industrial programs should be handled as approximations prepared 
within a reasonably short period of time. Only in this way can military 
strategy and the related operational programs be evaluated in terms 
of what is economically and humanly possible, and promptly readjusted 
to changing military and resources situations. 

Rough or aggregate balance sheets can be developed in two supple- 
mentary ways: (1) in terms of a summary Gross National Product 
analysis, and (2) in terms of a rough programming of needs for key 
resources. 

1. One may prepare a summary tabulation in dollar terms of the current out- 
put of goods and services and then see if production can be enlarged and 
readjusted to meet the proposed military requirements. Account must be 
taken of construction required during the period and the goods and services 
to be made available to the consumer-civilian economy. Capacity must be 
figured in terms of manpower, materials, and facilities which can be counted 
on during an anticipated mobilization period. This is a very useful tool to 
show rather promptly the limitations on mobilization programs in terms of 
the job the economy can do and the amount of shifting over to the produc- 
tion of war goods which is required. It may also be used to show the impact 
of mobilization programs on production of consumers’ goods and the in- 
flationary pressures created. 

. This general study must be supplemented by a series of balance sheets, one 
for each of the key resources which may limit overall programs. Then it can 
be shown what detailed adjustments are needed to bring about a balance. 
This “programming” of the uses of a material can be done in terms of phys- 
ical units, with or without analysis of special types and shapes. A rough 
summary presentation is adepuate for many policy problems. 


The problem of translating estimated end products requirements into 
basic materials and other resources needs can be attacked by using 
rough historical relationships. World War II data on the average 
metal content per million dollars of major categories of military prod- 
ucts are helpful in estimating the burden of proposed military programs 
on key materials. There are, however, two difficulties. In cases of 
substantial changes in the design of military products the metal 
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content must be recomputed. Also, adjustments have to be made to 
reflect price changes since World War II. 


MORE DETAILED CALCULATION OF RESOURCES AND REQUIREMENTS 
IN TIME OF MOBILIZATION 


The two frameworks of analysis just described are also used in 
formulating the balance sheets for use in mobilization or wartime, 
except that they are expanded to allow for greater detail. The Gross 
National Product analysis can be greatly enlarged, notably in the 
military categories. Programming can be carried out in terms of 
specific uses and for specialized types and shapes of the needed materi- 
als. The latter process may become “scheduling,” with the aim of 
directing particular quantities of resources to specific uses. The 
necessary statistical operation becomes complex; rough approximations 
are no longer adequate. Even so, measurement is still a process of 
statistical estimating. 

At first, analyses must be based on data already available. But as 
soon as possible additional data must be collected from industry and 
the military procurement groups. The government agency responsible 
for the balance sheet compilation lays down rules concerning the units 
of measure, time periods, program classification and resources to be 
surveyed. Figures from industry are needed for the calculation of both 


supplies and requirements. Only industry can furnish certain facts 
about capacities and even foreign supplies. On the requirements side, 
no one knows more accurately than industry what resources are needed 
to produce specific items. 

To determine materials requirements in the detail needed for war- 
time programming and schedule, two techniques are used: (1) bills of 
materials;! and (2) experience ratios. 


1. Where the amounts of materials that enter into a product are well-known, 
bills of materials are extensively used, for major military products as well 
as for many civilian products. Where accurate bills of materials exist, re- 
quirements calculations of end products in terms of raw material needs are 
quite satisfactory. Bills of material change frequently, however, and always 
need to be re-examined. In some cases where bills of materials are not avail- 
able for particular products, notably for civilian-type products, average 
resource usage is computed for more or less homogeneous groups of end 
products. Input-output studies have used this type of data. High-speed 
electronic calculators make it possible in a short period of time to aggregate 
the materials, parts, and labor needed to produce a large number of items 
in a series of military and civilian programs. The technique cannot yet be 
applied generally to current programs, since the use-ratios or coefficients 





1 A bill of materials states the amount of materials required to make one unit of a product. 
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for many groups of manufactured products are still in process of develop- 
ment. 

Prototype bills of material are also used. Where a category of munitions 
is composed of a large number of reasonably homogeneous items, it is some- 
times possible to estimate fairly accurately the material requirements by 
taking the item most typical of the group and computing the material needs 
on that basis. 

. World War II data on the relationship between the raw materials put into 
a plant producing an item, and the outflow of finished products, can be used 
in requirements estimating. During World War II it was found that these 
“experience ratios” remained stable enough so that they constituted a rea- 
sonably satisfactory working rule for estimating the quantities of materials, 
parts, and supplies to be allocated to particular programs. But the ratios 
did vary. The problem is to revise the ratios in the light of changes in design 
and content of products so as to make them applicable to future allocations 
or programming periods. Thus, bills of materials and experience ratios are 
used in conjunction. 


ADEQUACY OF STATISTICAL DATA 


In peacetime, the Government has neither the authority nor the 
staff to collect and analyze detailed figures of individual establish- 
ments. 

Government statistics collected and in large part published in peace- 
time are weak in the very areas which are of critical importance to 
mobilization. Information is needed particularly on production ca- 
pacity for key components, and on current uses of production capaci- 
ties, on metals consumption by products, both munition and civilian 
types, and on inventories of raw, semi-finished, and finished metals 
and products. Also, information must be assembled on prices for com- 
ponents, equipment, and finished products, notably munitions items. 
In fact, the analyst and the administrator need details about all aspects 
of the productive process. 

They have at their disposal the vast collection of World War II 
data, but many problems arise in using this information. First, there 
is the basic point that many types of products—planes, guns, and 
tanks—have changed so that the World War II data are not applicable 
or must be considerably adjusted. Then there is a classification prob- 
lem. Data of the World War II War Production Board are tangled be- 
cause of the many systems of classification employed. These were the 
original Census commodity codes, the several product codes used in 
operating the Production Requirements Plan, the Controlled Materials 
Plan code, and the so-called “B” product listing. It is hard to compare 
data because the item categories change from list to list and from year 
to year. 





MEASUREMENT AND MOBILIZATION PLANNING 363 


Recently, efforts have been made to remedy some of these statistical 
difficulties. Steps have been taken to close gaps in the development 
of capacity information and data related to the input-output system. 
The National Production Authority and other defense agencies have 
inaugurated a series of plant operations reports which will yield valu- 
able information on current materials consumption by plants. 


MAJOR TECHNICAL PROBLEM AREAS 


Attention may be called to a few problems in balance sheet analysis. 

Circularity of Requirements.—One difficulty is the circularity of 
requirements; that is, the interdependence of requirements as among 
industries and between requirements and supply. Consider the drafting 
of a man into the Army. Military requirements are increased, civilian 
requirements decreased, the civilian labor force, and therefore the 
capacity of the nation to produce goods and services, is decreased. 
Another illustration is that coal production requires steel and trans- 
portation; transportation requires steel and coal; and steel production 
requires coal and transportation. 

One way to break into this circle is through the use of assumptions 
worked into an inter-industry relations analysis or a Gross National 
Product projection. The structure of the GNP is based upon the 
stated size of the labor force, estimates of productivity, and assumed 
number of hours worked. It also reflects certain basic strategic and 
military assumptions as to the character, timing, and duration of 
mobilization, and other assumptions as to the derived demand for 
transportation and transportation equipment, electric power and 
generating equipment, general purpose industrial equipment, and so 
forth. 

Within this Gross National Product total are included estimates of 
the major classes of expenditures for consumer durables, non-durables, 
and services; for investment; and for government. These estimates 
are made consistent with stated general objectives with regard to the 
expansion of capacity. Most important for the purpose at hand are 
the estimates of the size of the military effort and of the degree to 
which civilian consumption may be limited by shortages in steel, 
plant capacity, or other limiting factors. 

In the GNP-assumptions technique, we must be cautious iest the 
results be predetermined by the assumptions. We must always be 
alert in reviewing calculations submitted to us and assure ourselves 
that the estimates represent independent judgments and are not mere 
arithmetical projections of the assumptions. 



































































































































364 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195) 


Flexibility in the Estimating System.—Mobilization planning must 
be flexible. It must permit the consideration and weighing of alterna- 
tives, so that the responsible government officials can evaluate the 
impact of proposed and alternative policies. Unfortunately, require- 
ments estimating is rather time-consuming and relatively rigid in its 
application. This is particularly true of military requirements. At the 
present time it takes many months to translate a strategic plan into 
a full set of military requirements and to test those requirements. The 
result is that if calculations are needed within a shorter period, the 
services merely adjust previously calculated requirements in an attempt 
to allow for changed assumptions, time periods, and conditions. 

On the civilian side, the government agencies have used more 
flexible techniques which permit rapid calculation and the considera- 
tion of alternatives. The price for this, of course, is substantially less 
detail in the calculations. For programming purposes, however, the 
information is adequate. 

This time and flexibility problem exists in much less degree for the 
detailed balance sheet program instituted in actual mobilization. It 
takes a long period of time to develop processes to the point where 
quick periodic balance sheets are prepared. But once the mechanism 
is perfected, data to cover short time periods can be produced with 
remarkable speed. Moreover, the existence of the procedure for the 
detailed balance sheet will aid in perfecting the acuracy of the long- 
range balance sheet and permit quicker calculations. 

Civilian Consumer Requirements.—In developing estimates of civilian 
requirements there is much need for improvement. To date, we have 
been forced to rely to a large extent upon World War II experience, 
taking per capita consumption at the 1943 or 1944 level, adjusted for 
population increase and changes in the character of the economy since 
the war, and on calculations by the War Production Board’s Office of 
Civilian Supply of certain minimum standards of civilian consumption. 
The great utility of the 1944 level is that we know approximately what 
it involved in the way of a rationing policy, administrative difficulties, 
and consumer resistance. The problems of applying these standards to 
balance sheet calculations, however, are formidable. We have, there- 
fore, been forced to view the civilian requirements as, in effect, residual, 
and then try to appraise the effect on the economy of the curtailments 
indicated. This is inadequate. 

World War II experience will not resolve this problem. It was never 
satisfactorily met in World War II, perhaps because we didn’t need to 
meet it. Techniques remain to be developed. 
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Regional Information.—In large part, the statistical analyses which 
have been used for mobilization planning deal with national totals. 
But for manpower and electric power, regional information is vital. 
There is a wealth of information on the supply aspects of labor and 
power market areas; but we have been able to do little on regional 
requirements. Little is known of the geographic pattern of prime con- 
tract awards or of sub-contracting. We may know that manpower 
will be in short supply nationally, and yet not know where there may 
be local surpluses which could conceivably be trausferred to other 
regions or whose. participation could be obtained by placing contracts 
in those areas. 

Both manpower and electric power are likely soon to be in sufficiently 
short supply in certain regions to call for direction of additional defense 
work elsewhere. Special surveys may have to cover the local uses of 
these resources as a guide to what can be shifted or curtailed. In war- 
time, of course, more adequate regional and area data can be collected, 
and the present gaps partially filled. There will probably still be a need, 
however, to develop better techniques for measuring the impact of 
national programs upon regions and areas. 

Coordinating Data.—Coordination between the individual balance 
sheets for basic material, manpower, transportation, and electric 
power, with their varying units of measure, is difficult. Some progress 
has been made in integrating balance sheets for different resources; 
yet much more needs to be done. The problem is partly one of data, 


CONCLUDING REMARKS 


Much developmental work was done in the early stages of World 
War II. Increasingly, the trend has been to duplicate, with some modi- 
fications, the techniques, report forms, and operating procedures that 
finally evolved in 1944; and also, though this is a temporary expedient, 
to use 1944 data to a considerable extent. This is not to say that 1944 
methods were perfect, or in particular that they were exactly suited to 
a situation of partial mobilization. But there is natural reluctance to 
abandon a system that was proved to be workable. Some improve- 
ments and adjustments are being made and others will follow. There 
is reasonable assurance that before very long the statistical support 
for an effective programming and allocation system will be available. 





A LARGE-SAMPLE TEST OF THE HYPOTHESIS THAT ONE 
OF TWO RANDOM VARIABLES IS STOCHASTICALLY 
LARGER THAN THE OTHER 


ANDREW W. MARSHALL 
The Rand Corporation 


I. SUMMARY 


HIS paper presents a large-sample, non-parametric test using 

grouped data. It applies when z and y are random variables with 
continuous cumulative distribution functions (c.d.f.’s) F(x) and G(y). 
A statistic S based upon the sum of the differences, at selected points, 
of the two sample c.d.f.’s, F,*(z) and G,*(y), is proposed for testing 
the hypothesis F(a) =G(a) for every a against alternative hypotheses 
in which one of two variables is stochastically larger than the other. A 
variable zx is said to be stochastically larger than a variable y if F(a) 
<=G(a) for every a, with the less than relation holding for some a. The 
asymptotic power efficiency of the proposed test procedure has been 
investigated in a preliminary way for a special case and has been found 
to lie between 0.64 and 0.94, depending upon the number of class in- 
tervals. 


II. INTRODUCTION AND STATEMENT OF TEST PROCEDURE 


Let z and y be two random variables with continuous c.d.f.’s, F(z) 
and G(y) respectively. The variable x will be said to be stochastically 
larger than y if F(a) SG(a) for every a, with the less than relation hold- 
ing for some a. We wish to test the hypothesis that F(a) =G(a) for all 
a against the alternative that z is stochastically larger than y. The 
importance of such alternatives has been stressed by Mann and 
Whitney [1], as it should be, for very often we want to decide not 
whether two treatments, drugs, age distributions, etc. are the same or 
different but whether one is better than another, or younger than 
another. The most natural formulation of problems of the latter type 
is often one in which alternative hypotheses to the null hypothesis of 
identity are of the stochastically larger kind. Mann and Whitney dis- 
cuss a test (introduced by Wilcoxen) based upon the ranks of the 2’s 
and y’s. For all combinations of sample sizes up to n=m=8 the 
probability distribution of this test statistic has been tabulated. For 
larger samples, the Mann-Whitney statistic is very nearly normally 
distributed and the appropriate test procedure is described in their 
paper. The application of this test, however, becomes very laborious 
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for moderate or large-size samples and an alternative test which would 
effectively simplify the computation is desirable. 

One very good large sample test of this hypothesis already exists in 
the literature but seems to have been generally overlooked. The result 
which is the basis for such a test is the one-sided analog of N. Smirnov’s 
more familiar theorem on the distribution of Max | G,*(a) —Fn*(a)| 
which has been rather fully treated in a now extensive literature; 
readers of this Journal will know of this through Massey’s paper [12] 
in a recent issse. 

Theorem (Smirnov) [3, 4]. If we have two independent samples of 
sizes m and n from populations defined by the two continuous c.d.f.’s 
F(z) and G(y) respectively, where F(a)=G(a) for every a, and if m, 
n—o in such a way that m/n—>r, where r is some fixed constant not 
zero or infinite, then 
(1) Lim Pr {./N Max [G,*(a) — Fn*(a)] > A} = e-™ 
where N=mn/m-+n, and F,,*(x), the sample cumulative distribution 
function of the m observations from the population having F(z) as its 
¢.d.f., is defined by 


0; r<n 
(2) F,,*(z) =k/m; te SU < Tey 
Re a 


where 21}< +++ <2p< +++ <a2m. G,*(y) is defined in an obviously 
similar way for the sample of n observations from the population with 
c.d.f. G(y). By Max [G,*(a) —F,.*(a) | is meant the maximum distance 
that the sample c.d.f. G,*(y) ever exceeds the sample c.d.f. F,,*(z) when 
x=y, over the whole range of values x and y may take. Since by defini- 
tion G,*(a) = F,,*(a) =0 for aS Min (x, y:) and G,*(a) = F,,*(a) =1 for 
a= Max (2m, Yn), it follows that Max [G,*(a) —Fn*(a)]=0. 

The usefulness of this theorem for constructing large sample tests in 
connection with the types of substantive problems considered in this 
paper is clear. With effective short cuts to lighten the computational 
burden of finding the Max [G,*(a)—F,.*(a)], really attractive tests 
are at our command. From the point of view of merely accepting or 
rejecting the null hypothesis sequential computation procedures are 
especially helpful in reducing computational labor. Even with grouped 
data this test is a good indicator of discrepancies between populations 
when samples are large. In this case the equality sign in (1) is replaced 
by a less than or equal sign. This is intuitively clear since the true 
maximum of the differences between the sample c.d.f.’s will have oc- 
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curred, with probability one, within some group interval rather than 
at an end point. This systematic bias, even for moderately short class 
intervals, may however be troublesome and it was to provide a more 
exact test for grouped data that the present test was derived. 

The test procedure suggested in the present paper is as follows: Sup- 
pose we have two samples, one of m independent observations 2; - - - , 
Im With a common continuous distribution function F(x), the other of 
nm independent observations y:, ---+, Y, With a common continuous 
distribution function G(y). We are to test the hypothesis that zx and y 
are identically distributed against the alternative hypothesis that z 
is stochastically larger than y. If F,,.*(z) and G,*(y) the two sample 
cumulative distribution functions are defined as in (2), we let 


= [G,.*(a;) — F,*(a;)]; i= i,- dea 


where a; are j arbitrary points which define j+1 intervals into which 
the m and n sample values fall. If the data one is working with are 
already grouped then, of course, the possible choices of the a; are con- 
siderably restricted; but prior grouping of the data, though perhaps in- 
convenient for this reason, does not invalidate the test procedure if the 
samples have been drawn from populations with continuous distribu- 
tion functions. It is assumed here that if the sample data are grouped 
the grouping in the two samples is, or can be made, identical. These 
new variables, the z,’s, are distributed as follows under the null hy- 
pothesis [4] 
E (zs) = 0, 


1 1 
E(z,?7) = (— + -) Didi; 


1 1 
E(eds) = (—+ -) paw: 
™m n 
@=1,---,j—-ljk =2,---,9; i<k 


where p;=F(a,) =G(a,), and q;=1—p;. Our test statistic is to be S, 
the sum of the z,’s. It is shown in section 4 that when the null hy- 
pothesis is true S is normally distributed with mean zero and variance 


(4) ‘Var (8) = (-- + ~)/( > pas +2 > | Lp). 


Thus a test procedure of significance level a is obtained from the deci- 
sion rule: reject the null hypotheses whenever 


S > Kav Var (S), 
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where K. is the standard normal deviate exceeded with probability a, 


ie., the Ka such that 
a) “ePhdt = a. 
V2nrJ x, 


In general we will not know the correct values of the p; and therefore 
will be unable to compute the variance of S from (4). If we substitute 
in place of the p; their maximum likelihood estimates p; based upon the 
pooled samples it can be shown that as in the case of the ordinary 
x test the statistic S will have its same limiting distribution. The 

correct estimates of the p; are given by 
nG*(a;) + mF*(a,) : ; 
p; = ; a7=1,---,). 

m+n 
The practical application of the test procedure given above is some- 
what less restricted than the usual x? test. Sample size requirements are 
essentially the same for its two tests but the S statistic requires only 
that a; and a, be chosen so that 


Min [mF (a) + nG(a:)] = 10, 
Min {m[1 — F(a,;)] + n[1 — G(a,)]} = 10. 


This is a rough rule of thumb which should insure a good approxima- 
tion to normality even for moderate sized samples. As regards the 
question of the most suitable number of class intervals, i.e., the best 
size of j, the indications are that the power efficiency of the test is 
much improved when j is equal to five or six rather than one or two, 
but improvement from then on is slow. A preliminary investigation of 
the power efficiency of the S test in the case of normal slippage of 
means with known variance has been undertaken and will be reported 
in a later publication. The findings are essentially that the power 
efficiency of the test is quite good especially for 725: for j=1 the 
asymtotic power efficiency is 0.64, for j=5 it is 0.91, and for j= 10 it is 
0.94. 





2 
= 


III. AN EXAMPLE 


In the course of a certain study it was required to decide whether 
persons with a positive family history of a particular disease, if they 
themselves contract the disease, have an earlier age of onset than 
persons with a negative family history. It seems reasonable to trans- 
late the hypothesis of earlier onset into the hypothesis that the age of 
onset of those cases with a negative family history is stochastically 
larger than the age of onset of those cases with a positive family his- 
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tory. A total of 2502 cases of the disease were obtained with data on 
the age of onset and the absence or presence of the disease in past or 
present members of the family. The ages of onset in the original data 
were given by one year intervals but for purposes of economy in 
processing, analyzing and presenting the data we have grouped the 
ages of onset into five year intervals. The null hypothesis then is that 
the age-of-onset distribution of those cases with a family history and 
those without is identical. The data are summarized in Table 1. 


TABLE 1 








(1) (2) 

Age of Cumulated | Cumulated 
Onset Positive Negative 
Cases Cases 


(1) +(2) 
Pooled 
Sample 





0-49 12 32 44 .0176 
50-54 28 93 121 .0484 
55-59 64 242 306 . 1223 
60-64 118 495 613 . 2450 
65-69 175 850 1025 .4097 
70-74 276 1276 1552 .6203 
75-79 360 1664 2024 .8090 
80-84 399 1942 2341 .9357 
85—over 415 2087 2502 


coooooco 




















In order to obtain the sum of the z,’s one has only to cumulate column 
(1) again, divide by 415, cumulate column (2), divide by 2087, and sub- 
tract the second figure from the first. We find that 


8 1847 8681 
S = a= *~ 
~ 415 





To find the variance of S we have 


1 1 
(= + waa) = 0.00289, 
415 2087 


8 
> pigs = 1.0477, 


t=] 
7 8 
> Sd pige = 1.2183. 
f=—1 kmi+1 
Substituting these into (4) we have 
Var (S) = 0.00289(1.0477 + 2 X 1.2183) = 0.0101. 
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Then 
ae Ss _ 0.2910 
*" JSVar (S) —0.1005 


The probability level, a, of this test is obtained from the upper tail of 
the standard normal distribution and in our example is 0.002. 

As mentioned earlier, the Smirnov test is also applicable and from 
Table 1 we obtain Table 2, which contains the information relevant 
for this purpose. 


= 2.90. 





TABLE 2 








Sample c.d.f. Sample c.d.f. 


iy. : Difference 
Positive Cases Negative Cases G.*(a;) —Fa*(a;) 
G,*(a;) F,*(a;) 





0.0289 0.0153 0.0136 
0.0675 ‘ 0.0446 0.0229 
0.1542 0.1160 0.0382 
0.2843 0.2372 0.0471 
0.4217 0.4073 0.0144 
0.6651 0.6114 0.0537 
0.8675 0.7973 0.0702 
0.9614 0.9305 0.0309 
1.00000 1.000 0.0000 














Max [G,.*(ai) — Fn*(a:)] = 0.0702 


i/ os  SO0NES 
n +m 


18.6055 X 0.0702 = 1.3061. 


nm 


Max [G,*(a;) — Fn*(a;) | > 1.3061 
m ¢ 


S e7%1.806)* = 0,033. 


The S test as was expected indicates a much lower probability level 
than the bound given by the Smirnov test, though both reject the null 
hypothesis at the 5 per cent level of significance. We conclude that it 
is very unlikely that the two groups of patients have the same distribu- 
tion of age of onset and tentatively accept the alternative hypothesis 
that the age of onset is stochastically larger (older) for persons with a 
negative family history than for persons with a positive family history. 
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IV. DERIVATION OF THE DISTRIBUTION OF S 


The test procedure will first be derived for the single sample case 
with the true distribution function known and then with minor changes 
the result will be shown to hold for the two sample case. In the latter 
case one need only assume that both samples come from the same 
population. 

Let 21, 2 ++, 2, be a sample of n mutually independent random 
variables with a common continuous distribution function F(x). We 
may define the sample cumulative distribution function F,*(z) as in 
(2) and choose j arbitrary points, j<n, which define j7+1 intervals on 
the z axis. Let us denote these points by a1, ---, ai, - ++, a;. Since 
the variable x has a continuous distribution, the probability is zero 
that any of our n sample values is exactly equal to any of the a,. If for 
one reason or another cases arise in practice it will be best to count one 
half of an observation in each of the two adjacent groups. We form 


z; = [F,*(a;) — F(a,)]. 
These new variables are distributed as follows [4]: 
E(z;) = 0, 
F(a) [1 — F(a,)] 





E(z7) = 


F(a,) [1 — Fa)] 


E(z:z.) = 





where the last expression holds for i=1, ---,j—1; k=2,---,j and 
i<k. In the limit as n— © the z,’s have a joint normal multivariate 
distribution. If we let 
21 
and p; = F(a;) with gq; = 1 — pi, 
2; J 
the joint normal multivariate distribution is completely characterized 


by the expected value of the vector z and the covariance matrix A. 
From (3) we have 


(6) E(z) =0 and 
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nN 


PGi P:Qi 
nr n 








Ss  ....*s 
L n n | 








Although the limiting distribution of the test statistic S= > -/_,2; can 
be derived directly from the expressions for E(z) and A, it is more con- 
venient to use the characteristic function (c.f.) of the joint distribution 
of the z,’s for the purpose. The characteristic function of the z,’s given 
(5) and (6) is 


b(t) = e- "ae at 


where t’=(t;, tz, - +--+, ¢,). Using a theorem concerning linear trans- 
formations of normally distributed variables ((5] Cramér, 24.4) and 
their characteristic functions it is easily established that > /_,2; has 
the c.f. 


v(u) _ e~'/2u*c’ac 


where C’=(1,---, 1) is a 1Xj unit vector. This is, of course, the 
proper choice of the linear transformation C since C’z= )_j_,z;. Thus 
S is in the limit normally distributed with mean zero and variance 


1( j-. i 
(8) one = —1 Do pa + 2 > pan 
t=] 


t=] kmi+l 


This derivation makes implicit use of a theorem about the limiting 
distribution of continuous functions of random variables. The inter- 
change of function and limiting operations is valid in this case. 

In the two sample case let 


24> G,*(a;) = F,,* (a); 


then under the null hypothesis that G(a) =F (a) 





AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1951 
E(z,) = 0, 


1 1 
E(z,?) = (— + —) Didi; 


1 1 
E(zz) = (— + —) PiQk, 


where 7 and k have the same range as before and p; again is equal to 
F(a,;) =G(a,). It is now easily shown that >°/_,2;=S is normally dis- 
tributed with mean zero and varianse 


| j —s 4 
(10) C’AC = (— + —)( LDpat2 DY dv pan). 


™ jan f=1 kemi+1 
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THE DISTRIBUTION OF THE RANGE IN SAMPLES 
FROM A DISCRETE RECTANGULAR POPULATION 


Pau R. RivER 
Washington University and Flight Research Laboratory (U.S.A.F.) 


B A discrete rectangular population is meant a set of values 


a,a+h,a+2h,---,a+(N — Ih, 


each of which occurs with equal probability, 1/N. There is no loss of 
generality in assuming a=0, h=1, in which case the above set of 
values becomes the set of N consecutive integers, 0, 1, 2,---,N-—1. 

In an earlier paper! I gave, somewhat incidentally, the distribution 
of ranges of samples of 4 from a discrete rectangular population of 10 
classes. The present paper derives the distribution of ranges of samples 
of any size from a discrete rectangular distribution of any number of 
classes. 

Let us consider the set of R+ 1 consecutive integers, 0,1,2,---, R. 
The range of this set is R. We wish to determine the number of different 
permutations containing n digits and having a range of R that can be 
extracted from this set. Repetitions are allowed. 

The total number of permutations of these R+1 digits is (R+1)*. 
But unless both 0 and R appear in the permutation the range of the 
permutation is not R. The number of permutations which do not con- 
tain 0 is R*. This is also the number which do not contain R. How- 
ever, (R—1)" permutations contain neither 0 nor R. Thus, the number 
of permutations having a range of R is readily seen to be 


(1) (R + 1)" —2Rk*+ (R—- 1)", R¥0. 
This is a polynomial of degree n—2, containing only odd powers of R 
if n is odd and only even powers if n is even. 

Let us now consider the set of N consecutive integers, 
(2) 0,1,2,---,N-—-1 (N-12R). 
The number of sets of R+1 consecutive integers in this set is obviously 
N—R. Consequently, to obtain the total number, f(R), of those 
permutations which have a range of R we must multiply (1) by N—R, 
obtaining 
(3) f(R) = [((R+1)*— 2k" + (R—1)"|(N—R), RO." 
Finally, to get the probability p(R) of obtaining a range FR in a sample 





1 On the distribution of the ratio of mean to standard deviation in small samples from non-normal 
universes. Biometrika, Vol. 21 (1929), pp. 124-143. 
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of n from a discrete rectangular population composed of N consecutive 
integers, we must divide (3) by the total number of permutations of n 
from the set (2), viz., N*. This gives 


conc) 
+EZY]C-B) nee 


For the exceptional case R =0 we have 


N 1 
(5) f0)=N, pO) =—-=—_- 

Table 1 gives the value of p(R) for samples of size n (n=2 to 10 
inclusive) from a discrete rectangular population of 10 classes. For 
purposes of comparison the corresponding probabilities are given for 
samples from the continuous rectangular population p(x) =0.1, 
O0<2z<N. These probabilities were obtained by integrating the well 
known frequency function for the range of samples of n from such a 
population, viz., 


o(R)dR = n(n — 1) (=) “ ar 


in which, of course, we set N = 10. The limits of integration used were 
0, 0.5, 1.5, 2.5, - - +, 8.5, 9.5, 10. 

It is of interest to investigate the probability of obtaining a sample 
range equal to the population range. For the discrete rectangular 
population of N classes this range is N —1, and the probability of ob- 
taining it in a sample is 


© weve Yee) 


If we set p(N —1) =a and solve for n the result will be the size of sample 
which must be drawn so that in a proportion a of samples the range 
will be equal to the population range. For example, if we take a=0.90, 
then 90 per cent of samples of size n will have a range N—1. For 
several values of N the following results were obtained: 





3 | 4 | 5 | 10 | 20 | 50 
|} 8 | 11 | 14 | 29 58 | 148 





N 2 | 
5 


n 
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Thus, » seems to increase linearly with N, with a few slight irregular- 
ities. In fact, for the results shown, the value of n is given by 3N —1 or 
3N —2 in every case. We therefore surmise that for the sample range 
to equal the population range with probability 0.90, the sample size 
must be nearly three times the number of classes in the population. 

In confirmation of this conjecture we set n=KN in (6), which then 
becomes 


@ pv 1) =1-2(1-—)+(1- =): 


n 


When the sample size n increases indefinitely, with k fixed, (7) ap- 
proaches the limiting form 


(8) Wi - 1) « G - oF. 


If we set this expression equal to a and solve for k, we get k= —log 
(1—+/a). For a=0.90, k has the value 2.970. Thus, if we wish a 90 
per cent probability that the sample range will equal the population 
range, the sample size should be nearly three times the number of 
classes in the population. Unfortunately, this is of no assistance in 
designing a sample, since it requires the very information which the 
sample is supposed to elicit. 

Since N = 10 gives us the population of random decimal digits, we 
can compare observed distributions of the range with the theoretical 


distribution by making use of a table of random sampling numbers. 

It seemed desirable to determine whether the distribution is sensi- 
tive in detecting lack of randomness in such a table and to this end a 
study was made of certain blocks from Tables of Random Sampling 
Numbers by M. G. Kendall and B. Babington Smith. In these tables 
the following blocks of 1000 digits failed to pass certain tests for 
randomness: 


Block Failed to pass 
26 Serial test 
47 Poker test 
49 Frequency and serial tests 
81 Serial test 
90 Serial test 


These five blocks were used in obtaining experimental distributions of 
the range 

Each thousand is divided into 250 blocks of four digits each. Thus, 
each thousand furnished a distribution of 250 values of the range for 
samples of four. These distributions were subjected to a chi-square 
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test against the corresponding theoretical distribution and in no case 
was the value of chi-square significant. 

The tables are also subdivided into blocks of two digits each, so 
that each thousand furnished a distribution of the range for samples of 
two. The chi-square test was again applied. Significance was found in 
only one case: in the 26th thousand the value of chi-square was 
significant at about the 2 per cent level. 


Taste 1. Probability of range R in samples of n from the discrete 
rectangular population p(x) =0.1; z=0, 1,---, 9 















































R n=2 n=3 n=4 n=5 n=6 n=7 n=8 n=9 n=10 
0 -1000 | .0100 | ,.0010, | {.0001 | .0000 | .0000 | .0000 | .0000 | .0000 
1 . 1800 -0540 | §.0126 », 0027 -0006 .0001 .0000 .0000 .0000 
2 - 1600 .0960 .0400 -0144 -0048 -0015 .0005 -0001 -0000 
3 .1400 . 1260 -0770 .0399 -0189 .0085 .0037 -0016 .0007 
4 .1200 . 1440 .1164 .0792 .0490 .0285 .0160 .0087 -0046 
5 -1000 . 1500 -1510 -1275 0975 .0700 .0482 .0322 -0210 
6 -0800 .1440 .1736 -1752 .1598 . 1367 .1118 -0886 .0685 
7 .0600 - 1260 -1770 .2079 - 2205 -2190 .2078 -1908 -1708 
& .0400 .0960 -1540 . 2064 . 2496 -2824 .3051 .3187 .3244 
9 -0200 -0540 .0974 . 1467 - 1993 - 2531 - 3068 . 3594 -4100 
TaBue 2. Probability of range FR in samples of n from the continuous 
rectangular population p(z) =0.1z, 0$z310 
R n=2 n=3 n=4 n=5 n=6 n=7 n=8 n=8 n=10 





-0975 -0062 -0005 -0006 -0000 -0000 -0000 -0000 -0000 
- 1800 -0525 -0115 -0022 -0004 -0001 -0000 -0000 -0000 
- 1600 -0975 -0388 -0134 -0042 -0013 -0004 -0001 -0000 
- 1400 1255 -0757 -0384 -0177 -0077 -0032 -0013 -0005 
- 1200 - 1435 -1150 -0772 -0469 -0267 -0146 -0077 -0040 
° - 1495 -1250 -0944 -0667 -0451 -0294 -0188 
-0800 - 1435 -1720 -1722 - 1555 -1314 - 1059 -0826 -0627 
-0600 -1255 -1753 - 2044 -2149 -2111 -1990 -1793 -1581 
-0400 -0955 - 1522 -2024 -2427 -2716 -2901 -2991 -3003 
-0200 -0535 -0955 - 1422 - 1906 - 2390 - 2856 -3293 - 3696 
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REPRINTS OF ABSTRACTS IN STATISTICAL METHODOLOGY 
Edited by Max A. Woopsury, Princeton University 


POPULATION INDEX 
1951 


Galvani, Luigi. 

Some critical remarks onthe "method of 
the standard population.” (Quelques re- 
marques critiques sur la "méthode de la 
population type.) Bulletin de l'Institut In- 
ternational de Statistique 32(2):368-376. 
1950 

See also: Considerazioni sul metodo del- 
la popolazione tipo. Statistica 9(3):309-322. 
‘July-Sept., 1949. ° 


1094 Hansen, Morris H., and Deming, W. 
Edwards. 

On an important limitation to the use of 
data from samples. Bulletin de l'Institut 
International de Statistique 32(2):214-219. 
1950. 


| 
1096 Kellerer, Hans. 

Random sampling methods in the official 
German statistics since 1946. (Stichpro- 


benverfahren in der amtlichen deutschen. 


Statistik seit 1946.) Bulletin de l'Institut 
International de Statistique 32(2):245-255. 
1950. 


1027 Midzuno, Hiroshi. 

An outline of the theory of sampling sys- 
tems. Annals of the Institute of Statistical 
Mathematics 1(2):149-156. 1950. 


1028 Midzuno, Hiroshi. 

Asurvey method using two kinds of sur- 
veys. Annals of the Institute of Statistical 
Mathematics 1(2):123-124. 1950. 


Mortara, Giorgio. . 

ethods used in Brazil for the recon- 
struction of missing vital statistics data 
with the aid of census data. (Sur les mé- 
thodes appliquées pour la reconstitution du 
mouvement de la population du Brésil A 
l'aide des données des recensements.) Bul- 
letinde l'Institut International de Statis- 
tique 32(2):350-358. 1950. 

Also published as: S6bre os métodos ap- 
licados paraa reconstitui¢4odo movimento 
da populacae do Brasil, com a ajuda dos 
dados dos recenseamentos. Revista Bra- 
sileira de Estatfstica 11(41):23-30. Jan.- 
March, 1950. 


PSYCHOLOGICAL ABSTRACTS 
1951 


eorge W. (Rand Corp., Santa 
ciples for construction 


~ $8. Brown, G 
bw pin Calsf.) Basic principl 
an 


tion of discriminators. J. clin. Psychol., 
1950, 6, 58-61.—“This paper deals with the classi- 
fication of an individual into one of two or more 
categories, on the basis of observations made on the. 


individual ther with a knowledge of the statisti- 
cal oustvotons of the aemwal emaiaee for 
individuals within each of the possible categories;” 
for example, the selection or rejection of pilot 
trainees might be made on the knowledge of their 
performance on a series of tests and of the relation of 
these tests to the criterion, successful completion of 
training. Neyman's and Pearson's theory aye 
this problem and its extension to the problems of 
classification into more than two categories are 
elaborated mathematically.— L. B. Heathers. 

59. Cronbach, Lee J. (U. Iilinois, Urbana.) 
Statistical methods for multi-score tests. J. clin. 
Psychol., 1950, 6, 21-25.—In testing we attempt to 
place a person within a k-space, where & is the 
number of traits being measured. One may string 
the k-space along a line and compare single scores 
but this overlooks interactions among the scores and 
raises the question as to the probable significance of 
obtained differences, particularly where the samples 
are compared on correlated variables. One may try 
to treat several scores simultaneously through using 
multiple regression and discriminant function meth- 
ods but these may produce clinically non-meaningful 
results. One may deal with only two or three, 
preferably reliable, variables at once, utilizing chi- 
square to test the significance of differences. Or one 
may use matching procedures, creating an experi- 
mental design to test several aspects of a ar 
— than using the description as a unit.— L. B. 

eathers. 


65. Hodges, J. L., Jr.. & Lehmann, E. L. (U. 
California, Berkeley.) Some blems in minimax 
t estimation. Ann. math. Statist., 1950, 21, 182- 
197.—The problem of point estimation is considered 
in terms of risk functions without the customary 
restriction to unbiased estimates. Whenever the loss 
is a connex function of the estimate, it suffices from 
the risk viewpoint to consider only randomized 
estimates. The minimax estimates are found ex- 
plicitly, using the squared error loss, for a number of 
specific problems. Several minimax prediction 
problems are solved. Formulae and mathematical 
proofs are included.—G. C. Carter. 


‘ oo epen, See 6 Estimation of the vari- 
ance of the bivariate normal distribution. Univ. 
Calif. Publ. Statist., 1949, 1(4), 37-52.—The distri- 
bution considered is that of . 
Y = V(X, — Mi)? + (X: — M;)* 

where X, and X; are two random variables normall 

distributed with known means M, and M; and with 
common unknown variance. The present paper con- 
siders the problem of estimating ¢ when the observa- 
tions of Y are gro . It obtains a solution by the 
method of minimum reduced chi-square with linear 
restsictions and derives the asymptotic variance of 
the estimate. Problems of optimum grouping are 
discussed with the aid of numerical tables and charts. 
—W. W. Grings. 





72. Tukey, John W. (Princeton U., N. J.) 
Discussion. J. clin. Psychol., 1950, 6, 61-74.— 
This is a constructively critical analysis of the series 
of articles in the January 1950 Journal of Clinical 
Psychology, on statistics for the clinician. The 
author feels that it is essential that the psychologist 
continue to formulate his hypotheses and experi- 
mental desi first and that he then turn to the 
statistician for the help he may need in developing 
statistical tools —L. B. Heathers. 
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73. Wolfowitz, J. (Columbia U., New York.) 
Minimax estimates of the mean of a normal distri- 
bution with known variance. Ann. *math. Statist., 
1950, 21, 218-230.—The classical estimation pro- 
cedures for the mean of a normal distribution with 
known variance are minimax solutions of properly 
formulated problems. Both sequential and non- 
sequential problems may be treated in this manner. 
Internal and point estimation are discussed. Mathe- 
matical derivations are included.—G. C. Carter. 

74. Zubin, Joseph. (Psychiatric Institute, Co- 
lumbia U., New York.) Symposium on statistics 
for the clinician. Introdu J. clin. Psychol., 
1950, 6, 1-6.—This is an introduction to a series of 
articles on statistics most appropriate in clinical 
research. “The purpose ‘of the symposium is to 
collect the adaptations of group statistics and provide 
the clinician with examples of their application to 
his problems.” The assumptions made when the 
universe treated is an individual are discussed. 
Though there is need for new statistical techniques, 
such methods as analysis of variance and covariance, 
discriminant and partial discriminant functions, 
inverted factor analyses, and sequential analysis are 
applicable to the study of clinical problems.— L. B. 

. 

670. Anastasi, Anne. (Fordham U., New York.) 
The concept of validity in the interpretation of test 
scores. Educ. psychol. Measmt, 1950, 10, 67-78.— 
A discussion of the meaning of validity and common 
misconceptions about its interpretation. The writer 
indicates that a more adequate handling of test 
validity will occur when test scores are “operationally 
defined in terms of empirically demonstrated be- 
havior relationships.” he distinction between a 
test and its criterion is seen only as a matter of 
convenience. It is pointed out that both test scores 
and criteria are essentially behavior samples and 
that a criterion may be equally affected by any 
variable that affects the test score.—J. E. Horrocks. 


671. Bartlett, M. S. (U. Manchester, Eng.) 
Tests of significance in factor oa Bas 
Psychol. Statist. Sect., 1950, 3, 77-85.—A test of 
significance for analysis into principal components 
is described and illustrated. Lawley’s maximum 
likelihood method is discussed. Equivalent analyses 
of correlation structure, direct derivation. of the x* 
approximation, closeness of the x? approximation, 
and the effect of eliminating the larger roots are 
examined on a theoretical basis.—G. C. Carter. 


673. Birnbaum, Z. W., Paulson, E., BS Eateens, 
F. C. (U. Washington, Seattle.) On the effect 
selection performed on some coordinates of a multi- 
dimensional population. Psychometrika, 1950, 15, 
191-204.—In sampling, circumstances sometimes 
make it more likely that individuals from one part of 
a population than from another will be drawn. The 
present method makes it possible, under certain 
assumptions, to correct this bias. As a result, the 
M's, a’s and r's of the original population may be 
reconstructed.— _M. O. Wilson. 

692. Gulliksen, Harold, & Wilks, S.S. (Prince- 
ton U., N. J.) Regression tests for several samples. 
Psychometrika, 1950, 15, 91-114.—In some studies 
it is necessary to give a set of tests to two or more 
groups. The question then arises as to whether 
results obtained for the various groups are essentially 
the same. To answer the question three hypotheses 
need to be tested: (1) that all o's of estimate are 
equal, (2) that all regression lines are equal, and (3) 
that these lines are identical. Criteria for testing 
these hypotheses and illustrative problems are 
presented.— M. O. Wilson. 

693. Hamilton, C. Horace. (North Carolina State 
Coll., Raleigh.) Bias and error in multiple-choice 
tests. Psychometrika, 1950, 15, 151-168.—There is 
a bias in scoring multiple-choice questions, the error 
ranging upward in — to the number of 
choices. A formula for estimating real scores from 
raw scores is derived. A binomial distribution of 
real scores is not assumed as is true in the Calandra 

formula. Other formulae also derived include those 
for variance of real scores in terms of variance of raw 
scores and fur the r between real scores and raw 

scores. Factors affecting the regression of real 


AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 1951 





scores on raw scores and the r are number of choices, 
number of questions answered, and the ratio of the 
average or 4 raw score to the variance of raw 
scores.—M. O. Wilson. 


694. Harper, Bertha P., & Harman, EH. 
(Personnel Research Section, AGO, Army, Wash- 
ington, D. C.) An empirical investigation of the 
extent to biserial and tetrachoric correlations 
a ximate the 


mom: 
sychologist, 1950, $, 370-371.—Abstract. 


696. ly William Leroy. (Lehigh U., Beth- 
lehem, Pa.) A chart for tetrachoric r. Educ. 
Psychol. Measmt, 1950, 10, 142-144.—Presentacion 
of a short-cut method for determining tetrachoric r 
designed as a substitute for the out of print Thurstone 
diagrams.—J. E. Horrocks. 


697. Johnson, Helmer G. (U. Minnesota, Min- 
neapolis.) Test reliability and correction for at- 
tenuation. Psychometrika, 1950, 15, 115-119.—The 
results show that specificity or lack of equivalence in 
comparable forms of tests lowers the value of reli- 
ability r’s but not the value of observed trait r's. 
Specificity does not lower ‘the r’s between two tests. 
Since the split-half and equivalent form r’s treat 
specificity as error, however, these r's should not be 
used in Spearman's formula to correct for attenua- 
tion. Such r’s will usually be much too high.— 
M. O. Wilson. 


1386. Cohen, A. C., Jr. (U. Georgia, Athens.) 
Estimating parameters of Pearson type III popula- 
tions from truncated samples. J. Amer. statist. 
Ass., 1950, 45, 411-423.—The method of moments 
is employed with ‘‘single”’ truncated random samples 
to estimate the mean and the standard deviation of a 
Pearson Type III population when ay is known and 
to estimate the mean, standard deviation, and a, 
when only the form of the distribution is known ia 
advance. It is not necessary to assume that in- 
formation is available About the number of variates 
in the omitted portion of the sample. The results 
obtained can be applied to practical problems with 
the aid of “‘Salvosa's Tables of Pearson's Type III 
Function.” —G. C. Carter. 


1394. Lehmann, E. L., & Stein, Charles. Com- 
pleteness in the sequential case. Ann. math. 
Statist., 1950, 21, 376-385.—The existence of un- 
biased estimates with uniformly minimum variance 
is discussed. A general necessary condition for 
uniqueness is found and applied to obtain a complete 
solution for the uniqueness problem when the 
random variables have Poisson or rectangular 
distribution. Necessary and sufficient conditions are 
also found in the binomial case without the restriction 
to bounded estimates. This permits the statement 
of a somewhat stronger optimum property for the 
estimates, and is applicable to the estimation of 
unbounded functions of the unknown probability. 
—G. C. Carter. 


PSYCHOLOGICAL ABSTRACTS 
1950 
4356. Edwards, Allen L. and Horst, Paul. (U. Wash 
Seattle.) The Calculation of sums of squares for in 


teraction in the analysis of variance. Psychometrika, 
1950, 15, 17-24 —The method 1s deseribed and veri- 


fied for second order interaction. Then it is proved 
and demonstrated that it ts appheable for the sum of 
squares for any Ingher order imteraction —M. O. 
Wison. 

4357, Green, Bert F., Jr (Educational Testing Serv- 
ice, Princeton, NJ) A note on the calculation of 
weights for maximum battery rehalilty Psycho- 
metrika, 1950, 158, 57 Ol --The weighted composite 
nteded to combine tests for maximal rehalihty os 
shown to be the principal axis of a matrix “closely 


related to the intercorrelation matrix.” Thus a simple 
and straightforward procedure may be substituted 
for the standard but cumbersome method of cumput- 
ug these weights.—M. O. Wilson, 
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BOOK REVIEWS 


An Introduction to Probability Theory and Its Applications. William Feller (Pro- 
fessor of Mathematics, Cornell University). New York: John Wiley and Sons, 
1950. Pp. xii, 419. $6.00. 


Davin BiacKwELL, Stanford University 


T Is a rare event when a mathematics book appears that is admirably 
| suitable for a textbook in an undergraduate course and at the same time 
is of great interest to the specialist, but Professor Feller has succeeded in 
writing such a book. 

In keeping with the modern approach to probability theory, the sample 
space—whose points are the possible outcomes of an experiment—is intro- 
duced as a basic concept, and it is emphasized that events are subsets of 
sample space and that random variables are functions defined on a sample 
space. Thus the reader actually finds out what a random variable is rather 
than merely, as in many treatments of probability, what a distribution func- 
tion is. In order to avoid measure theory difficulties, only discrete sample 
spaces are considered. 

Chapter 1 is a description of the sample space concept. Chapters 2, 3, and 
4 discuss combinatorial and occupancy problems, the multinomial and hy- 
pergeometric distributions, Stirling’s formula, and various inequalities relat- 
ing to the probabilities of combinations of events. Chapter 5 introduces the 
concepts of conditional probability and statistical independence. Chapters 6 
and 7 are devoted to the binomial and Poisson distributions. An estimate 
for the error in the Poisson approximation to the binomial is given. The 
derivation of the Poisson distribution as the distribution of the number of 
events occurring in an interval of fixed length under suitable restrictions is 
sketched. The normal approximation to the binomial distribution is obtained. 

Chapter 8 treats an unlimited sequence of independent events. Some of the 
events discussed cannot properly be considered as events within the frame- 
work of discrete sample spaces; their probabilities are evaluated as limits of 
probabilities of appropriate monotone sequences of events in discrete sample 
spaces. Doob’s system theorem—that if x, 2, - - + are independent chance 
variables with the same distribution, then any system for choosing which z’s 
to observe according to the values of previous 2z’s leads to an observed 
sequence 4, ¥2, °° * With the y’s independent and having the same distribu- 
tion as the x’s—is proved for binomially distributed z’s. The Borel-Cantelli 
lemma, the strong law of large numbers, and the law of the iterated loga- 
rithm are proved for independent binomial z’s with the same distribution. 

In Chapter 9 the concepts of random variable and distribution function 
are introduced. Standard theorems on expectations are proved, including 
one theorem which is frequently not stated, though always used, in statistics 
textbooks, namely that if X, Y are random variables such that Y =¢(X), 
then 2¢(2;)p; where X assumes the different values 2, %2, - - - wish proba- 
bilities pi, pz, - * - , does not depend on the particular X, ¢ used to represent 
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Y, so that there is no ambiguity in defining the expectation of Y by this 
expression. The Kolmogorov generalization of Chebyshev’s inequality is 
proved. 

Chapter 10 treats the weak and strong laws of large numbers. Both laws 
are proved for the case of independent identically distributed variables with 
finite expectation. The proofs illustrate the method of truncation and other 
standard techniques of modern probability theory. Various extensions of the 
laws and the Lindeberg form of the central limit theorem are stated without 
proofs and some applications are given, including a treatment of the Peters- 
burg paradox. 

Chapters 11 through 16 essentially form a unit. The method of generating 
functions is introduced, and is used to develop the author’s theory of re- 
current events. Essentially a sequence of events A;, Ax,--~- constitutes a 
recurrent event if the chance variables X,, X2,--- are independent non- 
negative integer-valued random variables with the same distribution, where 
Xi+ -+-+ +X; is the time at which A; occurs. The main result of the theory 
is that if all sufficiently large integers are possible recurrence times, then the 
probability of a recurrence at time n approaches 1/EH(x:) as n— «©. Examples 
of recurrent events to which the results are applied are returns to the origin 
in a random walk, runs of a given length, and returns to a given state in 
Markov chains with a finite or countable number of possible states. The 
entire theory of Markov chains with a finite or countable number of states is 
developed by the method of recurrent events. The treatment is simpler and 
more complete than any hitherto available in the literature, and constitutes 
an important addition to probability theory. 

The final chapter introduces the reader to the theory of stochastic proc- 
esses with a continuous time parameter. All processes discussed describe 
systems whose possible states are the non-negative integers; various assump- 
tions about the transition probabilities are shown to lead to systems of 
differential equations determining the functions P,,(¢)—the probability that 
the system is in state n at time ¢. The Poisson process and the birth and 
death process are used as illustrations, and several applications are given. 
The basic backward and forward Kolmogorov systems of differential equa- 
tions for Markov processes are derived, and the relation between the two 
systems is discussed. 

There are more than 300 well chosen problems, many of which contain 
important extensions of the material given in the text. The style is lively 
and informal, and gives the reader an intuitive feeling for the methods and 
applications of probability theory. There is a sentence on page 56 which 
seems to suggest that when a test at the 1% significance level rejects a hy- 
pothesis, 99 times in 100 the hypothesis is false, but this is the only instance 
noted by the reviewer of a statement which is likely to mislead the reader. 
This is a beautiful book and an important one, and Professor Feller’s readers 
await eagerly the appearance of volume II. 
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Probability and the Weighing of Evidence. J. J. Good. London and New York: 
Charles Griffin and Company, Ltd., and Hafner Publishing Company, respec- 
tively, 1950. Pp. viii, 119. $3.00. 


L. J. SavaGE, University of Chicago 


CCORDING to Good’s view, as expressed in this book, a probability meas- 
A ures the credence with which a person would regard a proposition in the 
light of other propositions regarded as evidence. Pursuing the implications 
of this view, he regards the theory of probability as an apparatus by which 
the person who uses it may check his own system of belief, or credences, for 
consistency and—if this is indeed a separate function—discover new beliefs 
implied by those he already holds. Good emphasizes that the criteria supplied 
by the theory are not so rigid as to require different people to hold the same 
beliefs, even when they are both supplied with the same evidence. Thus, 
personal judgment plays an important part in his theory. Good is not the 
only worker in the foundations of probability to hold a view with these 
general features; those of Ramsey, de Finetti, and Koopman may be men- 
tioned as earlier examples. 

The view of the foundations of probability which has dominated the rapid 
advance of statistical methodology during the past quarter of a century is 
sharply opposed to Good’s. It sees probability not as a property of the user 
of the concept, but as a property of certain things in the external world, such 
as gaming apparatus and the genetic mechanisms of organisms. All experi- 
enced statisticians recognize that judgment is important in statistical prac- 
tice, but they generally suppose that there can be no useful theory of 
judgment. Good is well aware of this currently popular position and brings 
strong arguments to bear against it. 

Though the position expressed by Good in the book under review is not 
new in its general outlines, his treatment is most original in its details and, 
the book may do much to show the merits of that position to those who 
employ the notion of probability, particularly statisticians. Others who have 
held the position have devoted considerable energy to discussing how the 
transition can be made from qualitative rules governing judgment of relative 
probability to the quantitative concept ordinarily associated with mathe- 
matical studies of probability. Good deliberately avoids this problem for the 
sake of others less thoroughly treated by his predecessors. He therefore 
introduces quantitative axioms (essentially on the Kolmogoroff pattern), 
taking as motivation, but not as proof, the usual analysis of gaming situa- 
tions into equally likely events. These quantitative axioms are interpreted 
as referring not to frequencies but to the credibility which a person, whom 
Good names “you,” attaches to propositions on the basis of hypotheses. The 
chief merit of his book, it seems to me, is the relative thoroughness with 
which this interpretation is explored through the discussion of general princi- 
ples, and through illuminating topics and examples, many of which are 
especially pertinent to statistics. 
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The book is in principle self-contained, so all conclusions of the crdinary 
mathematical theory of probability, which are used, are either demonstrated 
or mentioned explicitly with references to the literature. This relative com- 
pleteness sometimes makes the book a little boring to a reader who already 
knows the elements of mathematical probability, but this is offset by the 
possibility that the book is thereby made accessible to more readers, not to 
mention other obvious advantages. 

Good’s style, which seems to me to derive largely from the literature of 
logic and mathematics, is striking and, except for occasional obscure passages 
effective. For the most part, the atmosphere of an animated conversation is 
maintained, tliough the writing is in some respects very formal, much use 
being made of footnotes and references, both forward and backward. Aphor- 
isms are sometimes interjected, in which compactness is achieved by a 
faintly humorous insistence on the literal meaning of words. Good says, for 
example, “The saving of time is worth while in any application that is either 
urgent or not exceptionally important.” 


Le Calcul des Probabilités et ses Applications. Colloques Internationaux du 
Centre National de la Recherche Scientifique (XIII). Paris: Editions du Centre 
National] de la Recherche Scientifique, 1949. Paper. 


M. Lok&ve, University of California (Berkeley) 


HE French Scientific Research Center (C.N.R.S.) organized in 1948 at 

the University of Lyon, an International Colloquium on Probability 
theory and its applications. Sixteen papers were presented and are published 
under the above title. 

Only three papers are statistical in nature. J. Wishart presents an exposi- 
tory paper on tests of homogeneity of regression coefficients, which includes 
Bartlett’s (1934, 1937) and Carter’s results (published subsequently in 1949), 
together with applications in the analysis of covariance. P. Delaporte states 
his results in factor analysis: he replaces Spearman’s criterion of null 
tetrads by equalities between ratios of correlation coefficients and gives the 
distribution of these ratios with an estimation of the correlation between 
measurements of a character and the general factor. E. Halphen presents a 
few general objections to the selection, a priori, of an optimum rule. 

There is another paper, that of Doob, which has a direct bearing upon 
statistics, and which, together with that of G. Ottaviani, is centered about 
the notion of convergence with probability one (a.s. convergence). 

J. L. Doob, after Ville, P. Lévy, and others, developed the theory of 
“martingales” into a powerful tool for the study of a.s. convergence: a 
martingale is a sequence - - - X_,, Xo, X:,--+ of random variables with 
finite expectations, such that the conditional expectation of X, given the 
preceding random variables is equal to X,-1, with probability one. Using this 
theory, Doob obtains an elegant proof of Kolmogoroff’s strong law of large 
numbers and, under very general assumptions, gives a solution of the con- 
sistency problem. G. Ottaviani in his paper emphasizes (somewhat arbi- 
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trarily in the opinion of the reviewer) the role and importance of Cantelli’s 
extension of Borel’s strong law of large numbers and Cantelli’s “abstract” 
probability theory, as compared with Borel’s contribution and von Mises’s 
theory. 

Six papers are devoted to random functions. Those of J. Ville, Blanc- 
Lapierre and Kampé de Fériet are centered upon harmonic analysis of 
random functions. 

Kampé de Fériet shows that harmonic analysis in the stationary case ex- 
tends to the case of a parameter set which forms an abelian group. Blanc- 
Lapierre, who had made interesting contributions to the theory and physical 
applications of random functions (frequently in collaboration with R. For- 
tet), examines the problem of localization of energy on the axis of frequencies 
and some related ergodic problems as well as conditions under which all the 
moments of a random function are stationary. J. A. Ville discusses the notion 
of flux of information in telecommunications and its relations with the 
spectra of random functions of information; he also emphasizes the physical 
interest of expansions into double series of elementary signals. 

H. Wold studies in detail “punctual” stationary processes and shows their 
wide range of applicability, in particular to telecommunications. 

R. Fortet obtains numerous results relative to stationary and nonsta- 
tionary regimes of a telephone exchange, considered as simple and con- 
stant Markoff processes. P. Lévy, in a penetrating paper, proves that 
double Markoff processes which are stationary and normal are degenerate. 
He conjectures that all double Markoff processes are, at the best, semi- 
degenerate. 

Remaining in the domain of Markoff processes, we shall mention the in- 
teresting paper of G. Malécot who studies the genetical evolution of a popu- 
lation when the genetical compositions of successive generations are random 
vectors which form a simple Markoff chain. 

Out of the four remaining papers, three belong to probability theory proper 
and one to econometrics. 

M. Fréchet states a number of results relative to “typical” values of order 
zero and of infinite order of a random variable; proofs and various details 
are to be found in two subsequent papers (Revue Internationale Statistique, 
1948, Annales Ecole Normale Supérieure, 1948). D. van Dantzig transforms 
the method of generating functions into a much more flexible one, by in- 
troducing superabundant parameters. Among other results, he obtains a 
proof of a conjecture of Fréchet relative to the asymptotic behavior of runs. 
G. Darmois examines various cases in which the difference between a joint 
distribution function of two random variables and the product of the two 
marginal distribution functions is of constant sign. Finally, H. Eyrand states 
in his paper that the additivity axiom of probability theory has no equiva- 
lent in econometrics and discusses differences between economy systems of 
barter and those of credit. 

To the reviewer, this set of papers appears to be fairly characteristic of 
the actual poles of interest in probability theory and its applications, at 
least in Europe. 
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First Course in Probability and Statistics. J. Neyman (The University of Califor. 
nia, Berkeley). New York: Henry Holt and Company, 1950. Pp. ix, 350. $3.50, 


J. E. Morton, Cornell University 


_ year 1950 was a productive one in the area of probabilistic literature 
on the introductory level. At least three major undertakings deserve 
serious attention by the statistician: Feller’s Introduction to Probability 
Theory and Its Application, Fortet’s Calcul des Probabilités, and the presently 
reviewed First Course in Probability and Statistics by Jerzy Neyman. 

These three books are noticeable not only because of the undisputed fame 
of their authors but also because all three of them address themselves to 
those unfamiliar with probability theory proper yet engaged in its applica- 
tion. As to the fields of application, all three are directed toward the physical 
and biological sciences rather than toward the social studies. 

It is perhaps no accident that the many textbooks, as well as cookbooks, 
written over the past few years in the area of statistics have finally led to a 
concerted effort to fill in the badly needed logical foundations and to bring 
them within reach of the student of statistical techniques. 

Neyman’s First Course is the most elementary of the three as far as re- 
quired knowledge of conventional mathematical language is concerned. This 
does not mean at all that the exposition is non-mathematical in character; 
on the contrary, great stress is put on concept and use of mathematical 
models. However, a good knowledge of mathematics on the College Algebra 
level is all that is necessary for the reader to follow the text, provided he 
manages to keep up with the mathematical vocabulary which is skillfully 
and unobtrusively developed as the author proceeds from basic principles to 
the final chapter. 

The book is essentially a reproduction of lecture notes which were issued 
in 1947 and 1948 for use in a one-semester beginners course at the University 
of California, Berkeley. It is divided into five chapters. 

The introduction provides the noetic setting, closely following Professor 
Neyman’s lecture given at the September 1947 Meeting of the International 
Statistical Institute. This introduction will not be easy going for most be- 
ginners unless they are rather mature. However, after thoroughly absorbing 
it—probably only after reviewing it at the end of his course—the reader may 
rest assured that he will have acquired a better understanding of the scope 
and foundation of a part of modern statistical theory than many of those 
well versed in technique and familiar with various tricks of the trade. 

Chapters two and three, which occupy nearly one-half of the entire text, 
are given to classical probability theory and its application. Since the author 
limits himself to the consideration of finite probability sets, the text accom- 
plishes conceptual rigor without imposing on the reader’s knowledge of more 
than quite elementary mathematics; even such topics as simple combinatorial 
analysis, the geometric series, etc., are treated explicitly, to the extent of 
devoting to them close to one-fifth of the chapter on Probability. The last 
section raises the question of more complicated situations than can be 
handled by semi-automatic application of elementary theorems. Under the 
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heading of Evaluation of Competing Risks, Professor Neyman gives here one 
of the best operational introductions into the concept and use of model build- 
ing. 

Ties three takes the student to the application of the theory developed 
in the preceding chapter and is devoted entirely to problems in genetics. 
This may leave the student who is unfamiliar with or uninterested in this 
particular field rather shorthanded and frustrated. 

The fourth chapter attempts to link probabilistic and statistical patterns 
of thought. The concepts of random variable and distribution are explained, 
and a well-conceived treatment of the meaning and use of limiting processes 
familiarizes the student with the content of the statistician’s standard tool- 
kit of distribution functions (binomial, Poisson and normal). A concluding 
section on the geometric interpretation of Laplace’s theorem is likely to ap- 
peal to those who find the “optional” analytical proofs given at the enc of 
the chapter too difficult. 

Chapter five completes the book and deals with the elements of the theory 
of testing hypotheses. From the statistician’s point of view, this is the most 
interesting part of the book. To this reviewer the presentation is the best 
yet, on an elementary level, of the conceptual bases for a substantial part of 
present statistical theory. It is, without doubt, the first authoritative intro- 
duction to the “Neyman-Pearson” theory. Its reading should prove reward- 
ing also to the more “advanced” student—advanced in the sense of famili- 
arity with statistical techniques and know-how—who has had no previous 
opportunity to clear up his uncertainties and inferiority complexes with re- 
spect to the meaning of the very techniques to which he had become ac- 
customed via drill and general usage. 

The book is well written, in simple and clear language, and ample prob- 
lems and exercises are provided. The intensity of treatment of various topics 
is not quite uniform, and some teachers may find the unavoidable gaps irk- 
some; this may be true particularly if this course is to be immediately fol- 
lowed by a statistics course on the intermediate level, oriented toward the 
use of stochastic methods in economics and in some of the other social studies. 

Final evaluation of the content of a textbook must rest upon its usefulness 
which, in turn, depends on the whole curriculum of which the specific course 
is to be a part. Since the goals set for the training of statisticians are still 
ill-defined, varying from university to university, and from teacher to teacher 
within the university, no consensus prevails on a principle of optimum 
allocation of time or subject matter to the statistics course. 

Whatever the answer to this problem, there should be no doubt as to the 
necessity for a diet which can be administered to a student in the early 
stages of his education in order to counteract the overdoses of highly con- 
centrated cure-alls and patent medicines made according to routine pre- 
scriptions. 

Professor Neyman’s First Course, therefore, is a “must,” at least as col- 
lateral reading, for the more thoughtful beginner who will later find himself, 
for better or worse, in the business of consuming, producing, and disseminat- 
ing inferences in various subject matter fields. 
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Statistical Inference in Dynamic Economic Models. Cowles Commission Research 
Staff Members and Guests. Edited by Tjalling C. Koopmans, with an Introduction 
by Jacob Marschak. Cowles Commission Monograph No. 10. New York: John 
Wiley & Sons, 1950. Pp. xiv, 438. $6.00. 


Mituarp Hastay, National Bureau of Economic Research 


—— this book is addressed to economists, its central problem is not 
intrinsically economic and its main results may lead to applications in 
_ other fields at least as fruitful as those in economics. To paraphrase the 
Introduction, the book is concerned with the statistical treatment of data 
generated by systems of relations that are stochastic, simultaneous, and 
perhaps dynamic. A stochastic relation is one that holds apart from a dis- 
turbance describable by a probability distribution. Simultaneous relations 
are systems of two or more equations satisfied by the same sets of values of 
a given set of variables. A dynamic relation is one in which time plays an 
essential role. As anyone familiar with the work of the Cowles Commission 
appreciates, what is new in this treatment are the simultaneous character of 
the relations studied and the nature of the data as time series, which means 
that successive observations may not be independent. When proper account 
is taken of these complications, two difficulties new to statistical theory 
arise: (1) observational data may fail to yield definite solutions for the un- 
known parameters (the problem of identification) ; (2) classical least-squares 
methods may be inapplicable even when the first complication is absent (the 
problem of least-squares bias). The tasks set by the authors are, first, to 
develop criteria for characterizing systems whose parameters are determinate 
and, second, for such systems to work out methods of estimation appropriate 
to the identifiable parameters. 

With but minor violence to the facts, it can be said that success in solving 
these problems is confined to stationary linear systems, serially independent 
additive disturbances, error-free observations, and large samples. A number 
of chapters are concerned with relaxing one or another of these restrictions; 
but they are notable chiefly for making clear the formidable technical diffi- 
culties which stand in the way of a more general approach, and for showing 
that simple analogies with past statistical developments do not carry very 
far. These chapters, however, no less than the main memoir on linear systems, 
contain numerous pieces of close analysis which should prove of value to 
mathematical statisticians. For others the book may prove less rewarding, 
for the analysis is conducted with a proliferation of symbols that is bound 
to exhaust all but the mathematically hardened. 

Though economics is not intrinsic to the theme of this book, it nevertheless 
provides its motivation and its intended field of application. In this aspect 
the book provokes questions that are not so much statistical as philosophical. 
Is the most fruitful view of economic theory that which treats it in essential 
analogy with mechanics and meteorology? Such is the philosophy of the 
econometric school, but it must be said that those who have been hoping for 
a further statement on this question in this book will be disappointed. Apart 
from a few trivial examples, solely expository in purpose, a stochasticized 
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Walrasian model is arbitrarily laid down as the essence of economic reason- 
ing, and the authors take as their text a principle of Haavelmo that every 
testable economic theory should provide a precise formulation of the joint 
probability distribution of all observable variables to which it refers. It can 
be argued, however, that Haavelmo’s principle is sounder than the program 
for realizing it worked out in this book. For, as noted above, what we are 
asked to assume is that the precept can be carried out in economics by tech- 
niques which are established for linear systems, serially independent dis- 
turbances, error-free observations, and samples of a size not generally ob- 
tainable in economic time series today. In view of such limitations, anyone 
using these techniques must find himself appealing at every stage less to what 
theory is saying to him than to what solvability requirements demand of 
him. Certain it is that the empirical work of this school yields numerous 
instances in which open questions of economics are resolved in a way that 
saves a mathematical theorem. 

Still, there are doubtless many who will be prepared to make the assump- 
tions required by this theory on pragmatic grounds. We cannot know in 
advance how well or badly they will work, and they commend themselves 
on the practical test of convenience. Moreover, as the authors point out, a 
great many models are compatible with what we know in economics—that 
is to say, do not violate any matters on which economists are agreed. At- 
tractive as this view is, it fails to draw a necessary distinction between what 
is assumed and what is merely proposed as hypothesis. This distinction is 
forced upon us by an obvious but neglected fact of statistical theory: the 
matters “assumed” are put wholly beyond test, and the entire edifice of con- 
clusions (e.g., about identifiability, optimum properties of the estimates, their 
sampling distributions, etc.) depends absolutely on the validity of these as- 
sumptions. The great merit of modern statistical inference is that it makes 
exact and efficient use of what we know about reality to forge new tools of 
discovery, but it teaches us painfully little about the efficacy of these tools 
when their basis of assumptions is not satisfied. It may be that the ap- 
proximations involved in the present theory are tolerable ones; only re- 
peated attempts to use them can decide that issue. Evidence exists that trials 
in this empirical spirit are finding a place in the work of the econometric 
school, and one may look forward to substantial changes in the methodologi- 
cal presumptions that have dominated this field until now. 

It would be unfair to the present authors to write as if they were unaware 
of the foregoing difficulties, but it seems to this reviewer that they have not 
given adequate weight to their reservations in tempering the promise of 
success held out for their methods. We may, however, be witnessing a transi- 
tion. The outlook of this group was formed at a time when the remaining 
obstacles to the econometric program seemed likely to yield without much 
struggle. More recently the limited success which has attended efforts to 
push beyond elementary linear systems shows that an impressive range of 
hard problems stands in the way of greater realism. If this knowledge should 
lead to a new readiness on the part of the econometric school to recognize 
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established facts that lie outside the systems they can handle, research 
everywhere would profit; for real progress in economic understanding will be 
made in proportion as there is room for all talents to contribute to the com- 
mon goal. 


Technological Applications of Statistics. L. H. C. Tippett (Head of the Mechani- 
cal Processing Division British Cotton Industry Research Association). New 
York: John Wiley and Sons, 1950. Pp. ix, 189. $3.50. 


Besse B. Day, U. S. Naval Engineering Experiment Station 


LIKE this book. First, because it fills a long-time need in bringing together 

under one cover a well-proportioned and integrated treatment of the two 
broad phases of methodology, statistical quality control and significance 
testing. Too long there has been divided thinking on these subjects, with 
proponents lining up in one or the other camp. Either all problems must be 
solved by the control chart or there is a flaunted pride in total ignorance of 
that useful tool. This is the first book to present these methods in proper 
balance and standing. Thus the contents are divided into two parts: Part I, 
The Routine Control of Quality, in seven chapters; and Part II, Investiga- 
tion and Experimentation, in six chapters. The author emphasizes that the 
same fundamental concepts are the basis of both with numerous references 
from one part to the other. It has been proposed by one reader that he might 
have gone further by omitting these classifications, using chapter headings 
only. This book makes a very real contribution toward unifying the science 
of statistics. 

The second strong point in favor of the book is its perspective. To expand 
adequately on this we shouid take a look at the author. Mr. Tippett, Dean 
of the British Statisticians in the engineering field, prepared for his career 
with formal training in statistics under Karl Pearson and R. A. Fisher. He 
has had years of practical experience as statistician in the British Cotton 
Industry Research Association, and now occupies a position in the manage- 
rial field of that organization. In 1927, his Random Sampling Numbers ap- 
peared, the first of its kind; in 1931, his well-known and useful textbook, The 
Methods of Statistics, was published; and in 1943, a course of six lectures, 
Statistical Methods in Industry (an excellent pamphlet not well enough known 
in this country), was circulated. And so this author has passed through all 
the stages of growing up in the theory and practice of statistics. From his 
present vantage point I feel that Tippett is able to evaluate the place of 
statistics in the scheme of things and that he can justifiably speak with 
authority. He does this very neatly. Statistics is only a tool, a servant as it 
were, but a very useful one. Tippett emphasizes the importance of using 
common sense in its application, and in sticking to simple techniques in so 
far as possible. He does, however, call attention to the pitfalls, warning the 
reader of difficulties along the way. I anticipate that some statisticians will 
take exception here, will feel there has been over-simplification, that the high 
level of the profession has fared rather badly. 





BOOK REVIEWS 391 


Before further discussion of this intriguing book, a brief description of 
what it is would be in order. According to the Preface “This book is a ‘write- 
up’ of a course of lectures given at the Massachusetts Institute of Tech- 
nology to a mixed audience consisting of industrialists, some of whom had 
little more than a general appreciation of statistics, students, and practiced 
statisticians working largely in industry.” These lectures were given on the 
tenth anniversary, that is 1948, of a first set which were published in the 
Proceedings of the Industrial Statistics Conference of 1938. It is fortunate 
indeed that these should appear in book form for a wider distribution than 
the 1938 lectures had. 

This is not another textbook in statistics, neither is it a manual. The author 
explicitly states that it is intended as a companion to a textbook or course in 
statistics. Repeatedly the reader is advised to consult a textbook for a more 
complete treatment. It is not a “selling” book either. To gain the most from 
it requires an appreciation of the possibilities of statistics as a tool for solving 
problems. 

It is a mature book, unique in being both practical and philosophical. On 
first examination, this little book of less than 200 pages, is disarming. In 
fact, it appears almost superficial. There are no mathematical proofs, and 
few formulas, these very simple. There are, however, many examples with 
the complete data included, but with little of the actual arithmetic shown. 
One soon finds that there is much which requires careful reading and 
thoughtful consideration. It becomes a very meaty diet. 

Presumably this book is aimed directly at the technical man interested in 
statistics as a tool, but it can be of real value to the statistician as well. 
Both should have it. 

It will be particularly useful to the technical man as a guide in his thinking 
and study along statistical lines. The fundamental concepts are simply 
taught. The author’s examples are very effective in explaining concepts and 
techniques, tieing together each new idea with what has gone before. A fine 
balance is struck between what should be given and what is left to text 
books. While painting a very clear picture of basic concepts, methods, etc., 
he opens up vistas of the whole field of statistics, thus challenging the inquir- 
ing engineer to continue on. 

This is an ideal book for the young statistician who is just starting on his 
career in the practical field. It will speed up his learning but may prema- 
turely dampen his enthusiasm. He will be helped to avoid some of the more 
obvious mistakes of the “missionary.” The words of caution in this book are 
based on the author’s own experience. The mature statistician who is 
honestly trying to understand the problems of the technical man in order to 
do a better job will be helped by reading this book. Tippett has given me his 
“feeling” for the applied field in his seeming effort to give the engineer a 
“feeling” for statistics. That may have been one of his aims. 

The first chapter of Part I deals with the concept of measurement in the 
control of quality, discriminating between its technical and statistical as- 
pects. Frequency distribution, population, sample, normal distribution, 
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mean, standard deviation, and fraction defective are all concisely developed 
and illustrated in a minimum space. This is followed by a thorough discussion 
of the control chart technique, the underlying theory and details in its appli- 
cation, varied uses of it, and special applications. This treatment would serve 
as a companion to one of the manuals listed in the bibliography. Reading it 
gives an understanding and grasp of the subject that these do not provide, 
Part I ends with a concise treatment of acceptance sampling. All of this 
within 74 pages. 

In Part II, Investigation and Experimentation, the topics covered are the 
theory of errors, analysis of variance (basic and composite), correlation 
analysis, and the planning of an investigation. 

I particularly liked his discussion of the t-test. There is an excellent tie-up 
here with the first chapter of Part I. Chapter 9, Practical Application of the 
Statistical Theory of Errors, gives a very good discussion of the underlying 
assumptions in applying the theory of errors—normality, equal variability, 
randomness, and inclusiveness of the estimate of error. This chapter is one 
of the best in the book and worth careful reading. 

The discussion of components of error in the chapters on analysis of 
variance is good and timely. Here as mentioned earlier, the author recom- 
mends caution and the use of the more simple forms. “As the number of 
factors increases, the number of types and the complication of the analysis 
increases enormously, and experience is necessary before one can move with 
any confidence in this field.” He states it is “preferable to break down a 
complex field into relatively simple parts and investigate the facts separately, 
rather than combine everything into an omnibus analysis.” 

Following the principle of simplicity, only the more elementary experi- 
mental designs are introduced in the chapter on planning an investigation, 
randomized block, latin square, and split plot. The section headed “Econ- 
omy” is particularly recommended. The procedure for evaluating relative 
costs of proposed statistical arrangements is outlined in detail. An example 
is used to illustrate the relative error variance per unit cost of several differ- 
ent arrangements. Here the author’s thinking is in terms of management 
based on fundamental statistical principles. 

It is always difficult to organize material for a use other than that for 
which it was originally written. The few weak spots in the book, are, I believe, 
due primarily to this “writing-up” process. Hidden sometimes in a single 
sentence will be a large amount of “food for thought” which could well have 
been expanded to page length. 

I felt the treatment of interaction on page 126 was weak as compared to 
that of other subjects of no greater importance. Again, the last two sentences 
on page 136 dismissed in a much too off-hand manner the subject of second- 
order interactions, not, of course as it referred to that particular example. A 
more general discussion of this subject would have been valuable. 

Another criticism has to do with Chapter 12, Applications of Correlation 
Analysis. The first section headed “Correlation of Two Variables,” is in- 
clined to place the emphasis on correlation rather than regression (however, 
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the analysis of variance, p. 147, does use “regression line”), while the second 
section headed “Multiple Regression” simply extends the methods of correla- 
tion analysis to cover more than one variable. I would prefer changing the 
first to conform to the second, as regression is generally accepted as being the 
more pertinent measure, particularly in engineering applications. In the 
same chapter, the principle of covariance is introduced but never so called. 
At this stage in the book I believe the technical reader would have easily 
accepted it into his already expanded vocabulary. 

On the middle of page 79 the means of the first and last rows appear to be 
interchanged. 

This will be a tantalizing book for some. It leads to some soul searching 
and to the questioning of much that has been taken for granted. While I 
believe everyone will agree that Mr. Tippett’s book is sound in theory, no 
doubt some exceptions will be taken to his evaluation of statistics as a tool. 

A short but well-chosen Bibliography is included. 


Statistics, Vol. II. N. L. Johnson and H. Tetley. London and New York: Cam- 
bridge University Press, 1951. Pp. xi, 318. $4.00. 


Georce NicHo.son, University of North Carolina 


— book is one of a series of textbooks published under the authority of 
the British Institute of Actuaries and Faculty of Actuaries to meet the 
needs of students preparing for actuarial examinations. The authors state 
that the syllabus of the actuarial examinations and the amount of time 
which students will be able to devote to the subject have imposed limitations 
on the size and scope of the book. 

The limitations to which the authors refer have resulted in a well inte- 
grated and compact textbook covering the following topics: Chapter 11 (the 
first chapter in Vol. II) ‘“‘The Calculus of Distribution Functions”; Chapter 
12, “Applications of the Calculus of Distribution Functions”; Chapter 13, 
“The Multinomial Distribution and Its Application”; Chapter 14, “General 
Theory of Statistical Tests”; Chapter 15, “Stratified Populations and the 
Analysis of Variance”; Chapter 16, “Correlation Analysis”; Chapter 17, 
“Curve Fitting and Graduation.” Tables of x’, t and F are included, as well 
as two sections in the Appendix on Beta and Gamma Functions and Multiple 
Integrals. The text is liberally supplied with exercises which have been 
selected to give students practice in applying principles developed in the text 
and to extend certain important principles and methods which the authors 
did not include in the text. Finally there is an appendix section giving 
answers to the exercises and notes on the solutions. 

The emphasis of the book is based on the following point of view: “It 
should never be forgotten that the object of analytical statistics is to draw 
sound and useful conclusions from the available data. . . . The purely mathe- 
matical results are of importance only insofar as they are needed in the 
development of statistical theory, and this theory is itself only a means to 
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an end—the study of observational data. It is unfortunate that the deriva. 
tion of the results... requires more advanced mathematical equipment 
than is to be expected of the ordinary reader. The authors were faced with 
the dilemma that, if a sound and complete mathematical exposition were 
attempted, many readers would be baffled and discouraged, while if all 
mathematical developments were omitted and the results simply quoted, 
the student would not obtain any real understanding of the subject and 
might become a mere technician.” 

The authors have been quite successful in accomplishing what they set 
out to do. By limiting the number of statistical topics and concentrating on 
a few general statistical principles the mathematics required becomes man- 
ageable and the whole book achieves a smoothness and integration which is 
commendable. 

Even the experts will be interested in the fresh and vigorous way in which 
the topics are treated. For example, even though no lengthy multivariate 
techniques are employed, the distribution of the correlation coefficient is 
obtained for the case where either of the two variables is normal and the 
distribution of the other is arbitrary. Another example is the treatment of 
the problem of testing the significance of the difference between two means 
from normal samples when the variances are unequal. The treatment is a 
satisfactory reference on the Behrens-Fisher problem. It is shown that the 
denominator of the ¢' statistic is a linear function of two independent x*’s 
and t! is distributed approximately like an ordinary t, the number of degrees 
of freedom depending on the ratio of the two unknown variances. Since the 
number of degrees of freedom must lie between n:+n2—1 and the smaller of 
the numbers m,—1 and n.—1, it is pointed out that useful conclusions may 
often be drawn in this case. Proportional sampling is discussed and com- 
pared with random and optimum sampling. The analysis of variance is in- 
troduced, and its importance noted without letting it get out of control. 
The purposes of curve fitting and graduation are clearly stated and several 
methods are described and criticized. Rank order methods are discussed and 
their importance and utility noted. 

The examples are noteworthy. They have been carefully selected so that 
in practically all cases they contribute both an understanding of the theo- 
retical point they are intended to illustrate and also have real statistical 
interest. 

This is an excellent statistics book and deserves the attention of all persons 
who are interested in the subject. It is the best book of its kind that this 
reviewer has seen. The authors deserve much credit for a thorough and in- 
telligent job. 


Confidence Limits for the Hypergeometric Distribution. J. H. Chung and D. B. 
DeLury. Published for Ontario Research Foundation by University of Toronto 
Press, 1950. Pp. xiii, 72 unpaginated charts. 9” X12}”. $2.25. 


_—— charts relate to finite populations in which each item has or lacks 
a specified characteristic. A certain proportion (called the “sampling 
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rate”) of the population items is selected at random and the proportion of 
the sample items having the characteristic is determined. The charts show 
limits within which the population proportion may be expected, with pre- 
scribed confidence, to lie. 

Thirty-five basic charts are shown, covering confidence coefficients 0.95, 
0.975, and 0.995; population sizes 500, 2,500, and 10,000; and sampling 
rates 0.05 and 0.1 to 0.9 by steps of 0.1. Each basic chart is accompanied by 
an enlarged chart for sample proportions below 0.1. There are 12 supple- 
mentary charts for expressing the confidence limits as numbers instead of 
proportions in the population. 

According to the “Introduction,” linear interpolation, or extrapolation to 
populations larger than 10,000, can with sufficient accuracy be based on 
arguments N~"/? and [(1—s)/s]'/?, where N is the population size and s the 
sampling rate. Two special charts are given as a basis for extrapolation to 
populations smaller than 500. For sampling rates below 0.05, the binomial 
distribution may be used, through the familiar Clopper-Pearson type of 
chart, or the various tables and formulas for binomial confidence limits. 

W. A. W. 


Values and Integrals of the Orthogonal Polynomials up to n=26. Daniel B. 
DeLury (Director, Department of Mathematical Statistics, Ontario Research 
Foundation). Published for Ontario Research Foundation by University of 
Toronto Press. Pp. v, 33. $1.25. 


us table is similar to Table XXIII in Fisher and Yates, Statistical Tables 

for Biological, Agricultural, and Medical Research, but different in scope. 
It is useful in fitting a polynomial by least squares to n observations for 
which the independent variable is in arithmetic progression. It covers sample 
sizes up to n=26, and polynomials up to degree n—1. Instructions and 
tables for finding the area under the fitted polynomial within a given range 
are also included; indeed, this was the principal purpose in compiling the 
tables. There is a useful introduction, which includes examples and explains 


the construction of the tables. 
W. A. W. 


The Economic Theory of Cost of Living Index Numbers. Melville J. Ulmer. 
New York: Columbia University Press, 1949. Pp. 106. 


ArNoLp C. Harsercer, Johns Hopkins University 


y ppces small volume presents a simple, readable account of the theory of 
cost of living index numbers, together with a contribution toward its 
practical application. It is admirably suited as an introduction to this area 
for graduate students, economists, and statisticians whose major fields of 
interest lie elsewhere, and perhaps for those who have worked on the con- 
struction of index numbers without knowing, in detail, the economic theory 
behind them. 
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Ulmer’s substantive contribution consists of an attempt to set limits op 
the error involved in using traditional (Paasche or Laspeyres) index number 
formulas to approximate a “true” cost of living index. He makes a plausible 
case for the statement that the official indexes of the cost of living and of 
retail prices are subject to a negligible error. But his argument rests in part 
on a casual type of reasoning which is likely to convince mainly those who, 
perhaps on independent grounds, have already convinced themselves. Those 
who yearn for exactness—even probabilistic exactness—in empirical eco- 
nomics are likely to remain unconvinced by Ulmer’s demonstration. This 
possible defect should not be taken too seriously, however, An imposing array 
of economists and statisticians have addressed themselves to this general 
area, and have left their readers, if anything, even less satisfied as to the 
“exactness” of widely used indexes. Ulmer’s attempt to bring empirical 
evidence to bear on a problem previously dealt with on a purely abstract 
level is certainly a step in the right direction. 

Ulmer’s advice for improving the actual construction of index numbers is 
neither very extensive nor very original. Its value lies in its emphasis on the 
need for constant awareness of the purposes for which an index is to be used, 
and in its demonstration, by example, that common sense, together with such 
awareness, may sometimes yield worthy fruit. 


Index Numbers of Industrial Production, Studies in Methods No. 1, Statistical 
Office of the United Nations, Lake Success, New York, 1950. Pp. 60. 25 cents. 
Paper. 


Paut B. Smmpson, University of Oregon 

_— work is designed to assist countries in constructing index numbers 

and to promote comparability of indexes among nations. It discusses 
classification of industrial activities, meaning of production, selection of 
formula, practical problems of computation, and measures of labor produc- 
tivity. Strong recommendations are made for comprehensive censuses of 
production, annually if possible, and never more than five years apart. Ex- 
perimentation with sample surveys of production is urged. 

Problems of measurement of production are examined from the standpoint 
of national income accounting. Production is pictured as a value contribution 
of firms “with price changes eliminated, which can be approximated by a 
valuation at constant prices.” (P. 7.) This approach leads immediately to 
the choices of the Laspeyres and Paasche index formulas as the basic long 
run measures of physical production, modified as required for reasons of - 
classification and duplication of productive activities. The writers recognize 
that different choices of constant prices yield different indexes. To meet this 
problem, they suggest frequent chaining and the possible use of a cross 
formula such as Fisher’s “ideal” index. The desirability of reconciling the 
Laspeyres and Paasche forms is stressed, though just why this is desirable 
is not explained. The authors are on sound ground, however, in recognizing 
that comprehensive data, carefully examined, are the most important requi- 
sites of accurate measurement. 
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Industrial production is defined rather broadly, covering manufacturing, 
construction, electrical generating, and gas producing establishments. The 
classification is given in terms of the International Standard Industrial Classt- 
fication, and is similar to the old League of Nations recommendations. It is, of 
course, considerably broader than the coverage of the Federal Reserve index 
of industrial production, which excludes construction and utilities. 

The national income concepts adopted prove effective in dealing with 
problems of coverage and classification of productive activities. For example, 
change in the amount of work in process is a production element very im- 
portant in activities such as construction. Output series or delivery series 
which do not measure this type of inventory are defective. Another tricky 
point readily handled by national income ideas results from changes in the 
use of business services, brought about when firms make material substitu- 
tions or resort to prefabrication. Problems of duplication arise both in the 
quantity of production series and in the weights, though they are more im- 
portant for the former, as is recognized in the study. The recommended solu- 
tion for quantity series centers on the use of the Geary formula, whereby 
value of business services currently used are subtracted from value of current 
output, both valued in constant prices. 

Other excellent suggestions of a practical nature are offered, especially 
useful because they are not all readily available in standard literature. These 
include discussions of different types of production measures, such as ma- 
terial, labor, and energy input series, and deflated value series; imputed 
weights; working-day allowances; and revision schedules. The last deserves 
more attention than it receives, since it is one of the most troublesome fea- 
tures of index number construction both to makers and to users, and since 
the makers do not always keep the users fully informed regarding possible 
revisions and the reasons therefor. 

Although the report is full of suggestions for reaching the best estimate of 
production, it makes almost no suggestions for measuring errors of’ the esti- 
mates. Yet such “standard errors” are necessary if estimates are to be used 
properly and to be improved. A question in this regard is the proper use of 
the equation, value equals price times quantity. Independent data on quanti- 
ties of production, prices and values are frequently available. Values of pro- 
duction can be estimated by national income (factor cost) statistics, and by 
use of values of deliveries and inventories. Price quotations are not expensive 
to gather. If output measures are based on physical data and not on deflated 
values, a check on the accuracy of work can be made by means of the value 
equation. (If the factor reversal discrepancy is large, the equation is not 
exact, but the discrepancy can be studied independently.) Perhaps deflated 
values had better be used for this purpose than for improving the accuracy of 
estimates of physical production in particular sectors of the economy. 

Another deficiency in index number theory, which is not remedied in this 
study, is the definition of physical production. Definition as value of produc- 
tion in constant prices is a serviceable idea, but it does not provide answers 
to various questions that arise. Suppose that as a result of oil discoveries, the 
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price of oil drops, and for that reason oil replaces coal in industrial use. It is 
conceivable that the total final product of industry remains the same in two 
periods, while the index rises or falls depending on whether the Laspeyres or 
Paasche formula is used. Distortions in individual industries may be more 
serious. There is no method of determining whether production has fallen, 
risen, or remained unchanged, unless definition of production in terms of 
consumer utility, productive effort, or some other concept is adopted. Nor is 
there any logical method of determining the equivalence of old and new 
products, or of measuring qualitative changes without such a definition. The 
amount of production represented by a rubber tire of 1950 compared to that 
of a tire of 1925, differs vastly according to the definition of product in terms 
of miles of wear, quantity of rubber contained, real production cost, or 
simply units of tires. While the work in review has done a fine job of examin- 
ing the implications of the concepts which it adopts, it has not faced the 
problems of deciding what concepts are required for a suitable measure of 
production. 


The Agricultural Estimating and Reporting Services of the U.S.D.A. Washington, 
D. C.: U.S.D.A., Miscellaneous Publication No. 703, 1949. 


Ivan M. Les, University of California (Berkeley) 


5 gues agency responsible for the major portion of the agricultural statistics 
published by the U.S. Department of Agriculture is the Agricultural Esti- 
mates branch of the Bureau of Agricultural Economics. Part 1 of this book 
(about three-fourths of the text) is given over to a description of the data 
collecting and estimating procedures of this agency. Part 2 considers briefly 
some of the analogous activities of other agencies in the Department. 

The material is presented in 23 chapters, more appropriately viewed as a 
set of reports, each prepared by specialists in the various fields covered. 
Chapters 1 through 4 present in condensed form a description of the organiza- 
tion and operating procedures of Agricultural Estimates. Chapter 5 is de- 
voted to a general discussion of the methods of collecting data and procedures 
of estimation. 

In chapters 6 through 10, attention is focused on sources of information 
(acreage, production, yield, and other statistics) and estimating procedures 
for various crop categories (field crops, vegetables, and fruits and nuts). 
Chapters 11 through 14 treat the important livestock and livestock product 
statistics. Chapter 15 is a brief and general discussion of procedures used in 
estimating prices received and prices paid by farmers with some attention 
given to the widely used indexes of prices received and prices paid by farmers 
(as constructed prior to the latest revision released in January 1950). 
Chapter 16 is an even briefer account of the estimates of farm employment 
and wage rates. 

The remainder of part 1 is given over to a discussion of the Department’s 
_ experience in planning and conducting nation-wide interview surveys (chap- 
ter 17), the assessors’ censuses in some 14 states and their usefulness as aids 
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in the estimating work of the Department (chapter 18), and a brief look into 
the future of Agricultural Estimates (chapter 19). 

Part 2, consisting of four chapters, considers estimates prepared by other 
agencies of the Department of Agriculture, with main emphasis given to 
farm income statistics of the Bureau and the information disseminated by the 
Market News Service of the Production and Marketing Administration. 

Attention in this review is centered on part 1 and more specifically on the 
regular estimating procedures of the Agricultural Estimates branch. 

This book gives the most complete account thus far published of the 
methods employed by Agricultural Estimates and serves to extend and bring 
up to date the account published as U.S.D.A. Miscellaneous Publication No. 
171 in 1933. Although some attention is given to the experience in general 
enumerative surveys and other objective methods on a smaller scale, these 
methods form the basis for only a small portion of the estimates made by 
this agency. It is clear from this book that the main reliance continues to be 
placed on the voluntary mailed inquiry as the means of collecting data and 
on the use of graphic regression methods as an aid in eliminating bias from 
the resulting estimates. 

In the chapters dealing more specifically with the estimating procedures 
for various crop and livestock categories, the adaptation of these general 
techniques to specific estimating problems is described. In addition, refer- 
ences are made to a number of sources of independent check data and to the 
efforts to incorporate this additional information into the estimating pro- 
cedures. Exactly how the additional information is used in many cases is 
not described in any detail. The statement that census data serve as bench 
marks in the revision of estimates, for example, does not give the reader a 
specific picture of the role played by census data in the revision process. 
Limitation of space, of course, required a great deal of condensation in the 
description of procedures. The procedures described could have been more 
effectively conveyed to the reader, however, if some of the more important 
estimates regularly made had been carried through the entire estimating 
process from the collection of data to the final revised estimates, indicating 
something on the magnitude of the adjustments made at each stage of the 
process and giving in some detail the basis for such adjustments. As the book 
stands, however, it does give the reader a general picture of the procedures 
used and will serve as a convenient and useful reference for users of the sta- 
tistics published by this agency. 

To this reviewer a disappointing feature of the book is the absence of 
analysis of the estimating procedures directed toward appraisal of the pub- 
lished estimates. The contributors are not criticized for this omission since 
it is apparent that the objective of the book is one of description and not 
appraisal. However, this account given 0” the methods employed by Agricul- 
tural Estimates should serve to stimulate new interest in the difficult prob- 
lems of appraisal. It is perhaps not inappropriate, therefore, to dwell briefly 
on them here. Those familiar with the literature in this field will recognize 
that the points raised are not original with this reviewer. 
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The difficulties in forming judgments regarding the validity and precision 
of the regularly published estimates of the Agricultural Estimates branch 
arise mainly from: (1) the possible selectivity of reports in any particular 
survey arising both from selectivity in the mailing lists and the high rate of 
nonresponse, and (2) the opportunity afforded at various stages in the proc- 
essing of the data for the injection of personal judgment. The selectivity of 
the samples is recognized by several contributors to the present volume and 
the procedures of estimation currently in use are directed largely to the ad- 
justment of resulting biases. Opportunities for personal judgment to affect 
estimates are present throughout the collection and estimating process. 

For certain estimates the basic datum collected is judgmental. The use of 
condition reports for localities in forecasting yields may be cited as an ex- 
ample. Yield is particularly suited to objective estimation. Although experi- 
mentation with objective counts and measurements has been conducted ir- 
regularly in selected areas, these methods have not been thus far incorporated 
into the regular yield forecasting procedures. 

Further opportunity for personal judgment to enter the estimates is pres- 
ent in the “editing” and estimation procedures of the state offices and the 
Crop Reporting Board. Typical excerpts from relevant sections of the book 
will serve to illustrate this point (italics in all direct quotations are mine). 
With regard to “editing” in the State Statistician’s office, it is stated with 
specific reference to the August crop report (page 11) that each column on 
the schedule is scanned “to make sure that none of the entries therein differs 
so much from the others as to indicate either a misunderstanding on the part 
of the reporter, or a misplaced entry on the schedule, or an error in listing. 
Even when no error of this sort appears to have been made, a report that is very 
different from the others from a given county may be deleted on the basis that it is 
unrepresentative. Most questionable entries are either deleted or moved into 
the appropriate columns, but some that are apparently not attributable to 
misunderstandings or mechanical errors may be left in to represent minority 
situations in a given county.” In describing the opportunities afforded by 
extensive travel for state office personnel to obtain first-hand information on 
conditions in various areas through personal observation, etc., it is stated, 
“these statisticians make individual field observations and personally ap- 
praise the prospects of yield. They talk with interested and informed per- 
sons... . Back in the office, they interpret the indications derived from the gen- 
eral schedule in the light of the statisticians’ pooled observations and conversa- 
tions.” 

Based on the data collected in his own state, the State Statistician sub- 
mits his estimate to the Crop Reporting Board in Washington. The reviews 
by the Crop Reporting Board are described on page 12 again with specific 
reference to the August crop report on “nonspeculative” items for the state 
of Illinois. “In the first, or State, review, a member of the Crop Reporting 
Board reads the statistician’s comments and using much the same tech- 
niques as followed by the statistician in Illinois, arrives at his own recom- 
mendation for each item.” The state reviewer’s recommendations are then 
submitted to the entire Crop Reporting Board, and “... using the same 
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techniques which were used in the State office and in the State review, they 
(the Board) review the estimates for all States, approve or disapprove 
changes made by the state reviewer, and make other changes tf they believe the 
data warrant them.” Finally, “All changes are then approved or disapproved by 
the Chairman of the Crop Reporting Board.” 

A final point giving rise to difficulties of appraisal is the revision process. 
Most estimates made by the Department go through several revisions after 
preliminary estimates have been released. For several types of estimates and 
for several commodities, fairly complete information becomes available at a 
later date through such sources as assessors’ state censuses, the federal agri- 
cultural census, or other sources (for example, reports of cotton ginnings by 
ginners). For other types of estimates, check data are not too plentiful. Even 
jn cases where more complete data become available, their use as bench 
marks is usually possible only after some adjustment for incompleteness and 
conceptual incomparabilities. An appraisal of the validity of the final esti- 
mates would seem, therefore, to involve, in some cases at least, an appraisal 
of the validity of tue bench marks and some analyses of procedures used in 
relating final revisions to these bench marks. 

With this combination of estimating procedures, objective appraisal of the 
resulting estimates is not possible. The main reasons given for the continued 
use of these procedures in the face of modern developments in sampling 
methods are the familiar ones of budgetary limitations and the limited time 
available between collection of data and release of estimates. There is little 
doubt that a complete shift to more objective methods would be costly. This 
reviewer can not escape the conclusion, however, that for a number of esti- 
mates empirical comparisons with data provided by more objective pro- 
cedures constitute the only basis for really significant appraisals of reliability. 
Such comparisons in many cases would become dated quickly; hence, peri- 
odic checks would seem to be called for. With respect to costs, one might 
reasonably wonder whether the important role played by certain estimates 
(for example, prices) in the operation of government agricultural programs 
does not justify additional expenditures on the collection of data and more 
extensive research leading to an appraisal of the estimating methods cur- 
rently in use. 

Lacking these objective checks, the user of agricultural statistics would be 
aided somewhat in his personal appraisal of the estimates if more specific 
information were made available concerning the adjustments made in the 
“editing” and estimating process in the state offices as well as by the Crop 
Reporting Board. Empirical comparisons of “unedited” with “edited” data, 
direct sample estimates with State Statisticians’ recommendations, and 
State Statisticians’ recommendations with final estimates would be helpful 
in this connection. Of lesser importance, perhaps, but still useful would be 
some more detailed accounts of the procedure of revising estimates from in- 
dependently derived bench marks currently in use. 

Returning to the specific objectives of the book itself, it can be said that 
contributors to this volume have rendered a real service in making available 
this account of the methods underlying the growing volume of agricultural 
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statistics published by the Department. It is hoped that the descriptive ac- 
count will be followed by more analytical reports of the research undertaken 
in the continuing development of the agricultural statistics program in the 
Department. 


Techniques of Preparing Major BLS Statistical Series (Bulletin 993). U. 8. 
Bureau of Labor Statistics. Washington, D. C.: U. 8. Government Printing Office, 
1950. Pp. vii, 72. 40 cents. Paper. 


Crepric Wotrsz, Metropolitan Life Insurance Co. 


HIs is a collection of thirteen technical notes from the Monthly Labor 

Review, September 1949 through April 1950. The series covered are con- 
sumer and wholesale prices, employment, labor turnover, hours and earnings 
in industry, occupational wages, union wage and hour scales, productivity, 
work stoppages, industrial injuries, and construction. The series themselves 
are not tabulated in this Bulletin. Its purpose is to describe the sources of 
the data, the methods of collection, the statistical procedures involved in 
computation, and the limitations of each series. As the preface states, the 
notes are “written primarily from the point of view of the consumer and not 
the producer of the data.” Few algebraic formulas are shown; instead, de- 
scriptions are in narrative form, so as to be complete and understandable to 
laymen. 

The B.L.S. has carried out its purposes admirably and deserves high com- 
mendation. Other producers of statistics (both Government and private) 
have a similar obligation to their consumers and this should set an example 
for them. Since these notes were prepared by different people, they are not 
uniform in quality and clarity. Doubtless, as some users of the data raise 
questions about them, the B.L.S. will be able to make improvements. The 
Bureau might consider whether to issue future editions in loose-leaf form. 
Already one of the most important descriptions—that of the consumers’ 
price index—needs to be supplemented by the technical note which appeared 
in the Monthly Labor Review of April 1951 entitled “Interim Adjustment of 
the Consumer Price Index.” 

A few points require critical comment. On pages 2-3, there is a reference to 
substitution of items in the consumer price index when identical ones are no 
longer available in retail stores. It is pointed out that when a substitution 
serves the same purpose but is not of the same quality and is described by a 
new specification, it is introduced by a linking process and not permitted to 
affect the level of the index. An example of this type of substitution is cited, 
the replacement of silk hose by rayon during World War II. Probably this 
is a good description of a sound technique to follow in peacetime. It should, 
however, be pointed out that this linking procedure has not been followed 
uniformly and that during World War II the B.L.S. deviated from it sub- 
stantially, comparing the prices of wartime substitutes directly with the 
prices of pre-War articles that had disappeared. Such deviation was, of 
course, sound method for those times, 
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The note on the consumer price index could well have included a few other 
items of interest and service to readers, e.g., the fact that the component 
rent index suffers from new-unit bias. Also, that during the war the con- 
sumer price index had other short-comings, because it could not include 
black market prices, or allow for quality deterioration, etc. And that, despite 
this, qualified committees of experts making official investigations found that 
the understatement was not more than 3 or 4 per cent by the end of 1943, 
and 5 percentage points by 1945. 

In the section on monthly and weekly wholesale price indexes, there is a 
good, clear definition of coverage. We are told that these indexes are “not a 
measure of prices charged by wholesalers (i.e., jobbers or distributors),” but 
rather are confined to prices for large quantities quoted by manufacturers, 
commodity exchanges, etc. 

Although the sensitive daily index of spot primary market prices for 28 
commodities is not considered a “major” B.L.S. series and is therefore not 
discussed, it would have been worthwhile to touch upon it under wholesale 
prices, giving suitable warning about its smaller coverage and greater vola- 
tility than the weekly and monthly indexes. 

While pages 26-7 show that the B.L.S. in its collection of wholesale price 
data asked for information on changes in discounts, it does not say what is 
done about such changes, and does not disclose whether an attempt is made 
to measure changes in actual prices instead of in nominal quotations. 

At various points in the descriptions of the construction series, the reader 
is told that primary data such as building permits are finally converted into 
estimates of dwelling units started and dollar values of construction per- 
formed in succeeding months, according to predetermined monthly patterns 
based upon prior special studies. Since these are doubtless based on samples, 
there should be given some indication of the coverage and date of the sam- 
ples. 

Despite these questions, the Bulletin will prove very useful. 


Divergence between Plant and Company Concentration, 1947. Federal Trade 
Commission. Washington, 1950. Pp. v, 162. 40 cents. Paper. 


GeorGe J. Stiater, Columbia University 


y ieee study analyzes the difference between concentration of “output” in 
the largest companies and in the largest plants (establishments) in manu- 
facturing in 1947. The basic data are: (1) the value of output of the largest 
4, 8, 20, and 50 companies in various industries; and (2) the value-added of 
the largest establishments in these industries, where the size of the establish- 
ment is measured by number of employees. The procedure is to (1) plot the 
available points on the curves of the cumulative percentage of “output” 
against the number of firms or establishments ranked by size in descending 
order; (2) interpolate smooth curves by free-hand methods; (3) measure the 
area between the plant and company curves for each industry; and (4) ex- 
press the areas as multiples of the area in the median industry. The final 
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result is a single number for each industry: e.g., the index 6.53 for condensed 
milk means that the area between its plant and company curves is 6.53 
times that in the median industries (ball and roller bearings, electrical ap- 
pliances). Neither the data nor the procedures are satisfactory. 

As to data: Establishments are selected by number of employees and their 
cutput is measured by value-added, and this “output” is not comparable 
with value of product of companies selected by value of product. The various 
classifications of establishments in the 1939 Census of Manufactures permit 
a measurement of the biases introduced by the 1947 classification, and a 
study of sample industries suggests that (1) the direction and magnitude of 
the bias varies considerably among industries, but (2) in general plant con- 
centration is understated. It is not surprising, therefore, that in 15 industries 
a negative divergence is found, i.e., the plants were more concentrated than 
the companies, although presumably no plant was owned by more than one 
company. 

The procedure is equally hospitable to objections. One must exclude all 
industries with less than some arbitrary number of companies (50 was 
chosen)—which is itself illogical—so areas will be comparable; this elimi- 
nates 97 of 452 industries. One must make brazen extrapolations. The largest 
plant size in motor vehicles contains 69 establishments, so every point on 
the cumulative plant curve (except the origin) is conjectural. On average the 
first point on the establishment curve is at about 13 establishments, and 
only 2 or 3 points are usually available within the first 50 plants, so interpola- 
tion often dominates the results. The curves were drawn by a hand that was 
not only free (in a statistical, not an economic, sense) but also linearly in- 
clined. The expression of areas as multiples of the area in the median industry 
is well calculated to make the results incomparable with those of other times 
or places. 

The announced purpose of the study is to illuminate the role of economies 
of large scale production in corporate concentration. It is assumed that 
technological economies are intra-plant, and set a lower limit on company 
size unless one is prepared to sacrifice efficiency to competition. One might 
quarrel in either direction: not all “technological” economies are intra-plant, 
and many plants are large without being more efficient than smaller plants. 
(Moreover, the data have an additional deficiency from this viewpoint: 
body plants, engine plants, assembly plants, etc., are all motor vehicle plants, 
and one should compare only plants and companies performing the same 
function.) But I think the assumption is tolerable for the purpose. Then 
surely it is more direct and informative merely to compare the outputs of the 
biggest companies with the outputs of their biggest plants. 

Of course none of these criticisms vitiates the monograph’s “principal 
conclusion” : “the extent of difference between plant and ccmpany concentra- 
tion varies widely among the different manufacturing industries” (p. 14). 
But one can go much farther. T.N.E.C. Monograph No. 27 gives the concen- 
tration ratios of some 1807 products and also the number of plants operated 
by the four largest firms in 1937. Assume perforce that the plants of the four 
largest firms are of equal size (or alternatively, that the plants of these com- 
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panies are on average the most efficient size). Then one can calculate the 
concentration ratios of each product which would obtain if each company 
owned only one plant: if the concentration ratio of an industry is K (i.e., K 
per cent of the output of the industry is produced by the four largest firms), 
then 4K/N is the piant concentration ratio if the four largest companies had 
N plants. The concentration ratios of products with values of $25 million or 
more in 1937 are given in the adjoining table. 

I think an industry concentration ratio of 50 or a product concentration 
ratio of 70 is consistent with competition, given the Census industry and 
product classifications; however, many economists would consider these ratios 
too high for competition. On my view, hardly any products had high con- 
centration ratios; on any reasonable view, most products would have low 
concentration ratios if each company had only one plant. 


ConcmEnTration Ratios or Propucts py COMPANY AND PLANT IN MANUFACTURING, 1937 
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The plant-company relationship raises many other questions. Are raw ma- 
terials, markets, or transportation costs relevant to the industry’s plant- 
company structure? Are there diseconomies of large plant size? Do the plants 
of multiple-plant firms make more or fewer products than single-plant firms? 
Are plants built by present owners or acquired by merger? They are not 
touched upon in this unimpressive monograph. 


Productivity in the Blast-Furnace and Open Hearth Segments of the Steel In- 
dustry 1920-1946. William T. Hogan. New York: Fordham University Press, 
1950. Pp. vi, 150. $4.00. 


Bruce S. Oup, Arthur D. Little, Inc. 


HIs book attempts to analyze productivity in the two most important 
furnaces in the steel industry, the blast furnace and the open hearth 
furnace. Productivity is defined as the ratio between the three important 
factors of production—capital equipment, labor, and raw materials—and 
output. To the figure of output expressed as man-hours per ton the author 
has attempted to add explanations based on capital equipment and raw 
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material changes during period covered. These explanations leave much to 
be desired. 

The subject covered best is the change in man-hours per ton in the blast 
furnaces and open hearths of a single steel plant over the 26 year period 
studied in detail. Nothing is ssid, however, about how the figures in the 
particular plant compare with those of the best plants in the industry or 
with other geographic locations. 

This book would be more valuable to the statistician if it gave an idea of 
the basic variables affecting blast furnace and open hearth output. With 
these firmly in mind, the reader could extrapolate in one direction or another 
as new situations arose, to get some idea as to how they might affect output. 
A full explanation of this would take more space than is allotted to this re- 
view, but three examples of the sort of thing lacking are: (1) There is no 
relationship given between wind blown and iron produced so that the prob- 
able effect of the purchase of a new blower cannot be foretold by the reader. 
(2) No mention is made of the effect of increasing slag volumes on blast 
furnace output. This is the most effective manner of approaching the ques- 
tion of raw material quality, a question which is glossed over in the book. 
(3) No mention is made of the effect on tons of steel produced per hour of 
such imporant variables as total ferrous charge, charging time, fuel rate and 
type, per cent of hot metal charged, etc. 

In summary, the book fails to supply the reader with the orientation in 
the steel industry required if he is to assess intelligently the probable effects 
on productivity of various changes in capital equipment and raw material 
quality. A book of this type probably should be written by a team involving 
an economist, a statistician, and a metallurgist. 


Census of Manufacturers: 1947. Vol. I—General Summary. $2.75 (buckram). 
Vol. II—Statistics by Industry. $4.75 (buckram). Vol. III—Statistics by States. 
$4.50 (buckram). Product Supplement. $2.25 (buckram). U. S. Bureau of the 
Census. Washington, D. C.: U. 8. Government Printing Office, 1950. 


HESE volumes are the latest and probably best of an outstanding series. 

Major innovations include a pre-canvass to check the mailing list; a post- 
census sample check on completeness of coverage, with resulting estimates 
of under-coverage; a retabulation of part of the 1939 reports in accord with 
the revised industrial classification used for 1947, to maintain continuity; 
the use of a short form for small establishments; and an improvement in the 
handling of the problem of disclosure. For the first time, also, there are pre- 
sented man-hours for all industries and comprehensive information on metal- 
working operations, and (for some reason) data on motor vehicles owned or 
leased by manufacturers. The list of products for which quantity data are 
provided is longer than in previous censuses. Unfortunately, no data were 
collected on central offices or the distribution of manufacturers’ sales by 
marketing channel; and some will regret also that the aggregate value of 
products of certain industries and major groups of industries has been 
omitted because of duplication. S. F. 
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A World Statistical Survey of Commercial Production: A Geographic Source- 
pook. John C. Weaver and Fred E. Lukermann. Minneapolis, Minnesota: Burgess 
Publishing Company, 1950. Pp. v, 146. $4.00. 


nis is “essentially a source-book of information for the use of college 
: swat in courses dealing with the commercial aspects of economic 
geography.” However, statisticians generally will find the volume to be a 
convenient compendium of data, for various countries and the world as a 
whole, on population, land resources, sources of power, and the output of 
important products of mines, forests, fisheries, and agriculture. While some 
historical series are included, major emphasis is placed on data for recent 


years. 
8. F. 


Taxes, The Public Debt and Transfers of Income. Donald C. Miller. Urbana: 
University of Illinois Press, 1950. Pp. xi, 153. $3.00 (cloth), $2.00 (paper). 


DaniEt M. Houuanp, National Bureau of Economic Research 


_. distribution among different income groups of the burden of taxation 
and the benefits derived from government expenditure comprises a seri- 
ous gap in our knowledge. Exhortations to provide this information have 
been plentiful but few serious studies have been undertaken. Miller’s volume 
constitutes a valuable addition to the small list of contributions in this area 
which, for the United States, includes Colm and Tarasov’s study of the tax 
burden in 1938-39 (TNEC Monograph No. 3); Tarasov’s similar analysis for 
1941 (Supplement 4 to Social Research, 1942); Adler and Schlesinger’s study 
covering both taxes and benefits in 1946-47 (in Fiscal Policies and the Ameri- 
can Economy, edited by Kenyon Poole); and the estimate of the tax burden 
in 1948 by Musgrave, Carroll, Cook, and Frane (National Tax Association 
Journal, March 1951). 

Miller’s particular problem was the economic effect, in 1945, of the trans- 
fer of income necessitated by interest payments to the holders of the federal 
debt. “Did the existence of the transfer problem result in the level of na- 
tional income being substantially different than it otherwise would have 
been?” His answer, very succinctly, is that “in this particular year the public 
debt as a redistributive or deflationary force upon the income flow of the 
economy did not appear to be of great importance from the viewpoint of this 
study.” This is a contribution, no matter how much one may disagree with 
specific features of his analysis or think that, in some cases, he should have 
chosen one rather than another horn of the many dilemmas he confronted. 
Miller furnishes an idea of the order of magnitude of the problem. 

The transfer on the publie debt is viewed as a process which involves in- 
terest payments out of tax receipts. Then, with tax burdens and interest 
flows both broken down by income classes, for any given class there can be 
computed the percentage which its interest receipts comprise of all interest 
receipts and the taxes falling on it comprise of total federal tax levies. This 
requires, among other things, dealing with unsolved problems of the inci- 
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dence of various taxes. In particular, because their amount is large and their 
incidence unsettled, Miller uses alternative assumptions for the corporation 
income taxes: (1) that they fell on the shareholders, and (2) that two-thirds 
of them were borne by the shareholders and the remaining one-third by 
consumers in the form of higher prices of corporate products. A comparison 
of the two percentages is then made to see whether each income class ob- 
tained a net loss or gain through the transfer. Finally, following the Keyne- 
sian theory, the effect on spending and, therefore, on the level of national 
income is determined by applying the appropriate marginal propensities to 
consume to the income redistributed. This very brief description glosses over 
the mass of calculation and data adjustments which Miller had to perform, 
the assumptions he had to make, the ingenuity he showed. Because of these 
complexities, Miller’s results should not be used for any other purpose with- 
out a full examination of his methods. 

By Miller’s calculations, assuming the corporation income tax to fall 
wholly on stockholders, “transfer of income from the low income groups to 
the high (above $5,000) of some 87 millions occurred in 1945.” Assuming 
that two-thirds of the corporation income tax was borne by the stockholders 
and the remainder was shifted via higher prices of consumer goods, “the 
transfer from the low income groups to the high (above $5,000) income classes 
was 239 million dollars.” The effects on consumption were equally slight. 
Using the latter redistribution figure, “this transfer served to reduce total 
spending on consumption by some 125 million dollars . . . as the 239 millions 
were transferred from low income groups with a marginal propensity to con- 
sume of roughly 0.80 to high income groups exhibiting a marginal propensity 
to consume of 0.27.” To this must be added 100 million dollars of interest 
which went to foreigners and the Federal Reserve Banks, both considered 
“in their entirety as leakages.” The author concludes: “Thus, from this stand- 
point, total consumption in 1945 was reduced by some 225 millions as a result 
of the transfer of income due to the public debt.” 

There appears to be some inconsistency in the way two rather similar 
problems were handled. Miller argues that the federal interest receipts trans- 
ferred to surplus by commercial banks, estimated at $317 million, led to 
greater consumption expenditures because of the increase in stockholder 
equity. (He points out that this procedure imparts an upward bias to the 
results, for the effect is considered to be as great as if the equivalent amount 
of funds had actually been paid out in dividends to the stockholders; but this 
upward bias tends to be compensated for by what appears to be an under- 
estimate in another portion of the estimates.) In another part of his analysis, 
however, series A-F bond interest accruals, which totalled about $400 million 
on Miller’s estimate in 1945, are left out because they were not actual “trans- 
fers.” But is not the same point involved as in the case of transfers to surplus 
of commercial banks? Would not the increase in the value of their bonds tend 
to increase the consumption expenditures of Series A-F bondholders? 

It is to be regretted that Miller did not digress a little and, using informa- 
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tion easily available from his study (pp. 46 and 86), estimate the effective 
rates of tax for the federal tax system as a whole by income classes. 

The author’s method of calculating the effect on consumption of the in- 
come transfer due to interest payments on the public debt seems to have an 
arbitrary element. Referring back to the quotation above on the effect on 
consumption under the assumption that two-thirds of corporation income 
taxes were borne by stockholders, the results depend on where the dividing 
line between high and low income is taken. The quotation refers to the gains 
of the over-$5000 class at the expense of the income classes up to $5000, with 
the under-$1000 class not really being involved at all because of a negligible 
amount of interest transferred. Surely high and low is a matter of taste here, 
and he could just as logically have computed the transfer with $2000 as the 
dividing line. In this case the income transfer from low to high would have 
been only about $70 miliion, and the differences between the marginal pro- 
pensities to consume of the high and low groups would have been much less 
than 0.50, so the fall in consumption would have been less than $35 million. 
Since the quantities involved are small, it makes little difference for the 1945 
results where the dividing line is taken. But an important matter of principle 
is involved which may have greater quantitative repercussions if Dr. 
Miller’s techniques are used for other years. Where the dividing line is set 
determines first the amount of income transfer and secondly, via this and the 
differences in marginal propensities to consume of the high and low groups, 
the magnitude of the effect on consumption. A more logical method would 
have been to apply the appropriate value of the marginal propensity to con- 
sume to each income class net income gain or loss, and take a net total of 
these consumption increments or decrements. 


La Méthode Statistique dans L’Industrie. Laurent, André-G. Paris: Presses 
Universitaires de France, 1950. Pp. 134. 


Harowp A. Freeman, Massachusetts Institute of Technology 


— is not a quality control manual; it is an introduction to the subject 
for those who may be interested in the general nature and contours of 
statistical quality control, rather than its specific methods. It is non-mathe- 
matical, though there is evidence of the author’s interest in statistical 
theory, including current developments. 

Part 1 describes some elements of the statistical theory underlying indus- 
trial applications. Notions of frequency, probability, population, sample, and 
random sampling are introduced. Also a few specific distributions, among 
them Gibrat’s, then the Geary-Pearson test for normality. There is a section 
on estimation—Fisher’s criteria, Bayes’ rule, interval estimation; a section 
on testing hypotheses includes brief accounts of errors of both kinds, and a 
few examples of power functions. Variance analysis, regression, discriminant 
functions, and canonical correlation come next but by this point some of the 
topics are being “discussed” in a paragraph, and a literary one at that. 
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Part 2 has some useful things to say about the statistical aspects of in. 
dustrial problems. There is a section on sampling of mixtures, seven inter. 
esting pages on granulometric analysis, a discussion of the industrial meaning 
of normal distribution areas and the statistical theory of rupture. 

Part 3 deals with statistical methods in production. There is a sketchy 
history of quality control from its beginning through World War II; the 
emphasis here is on American and British experience, with a few notes on 
developments in other countries. There follows a quite non-technical discus- 
sion of elementary quality control, control limits for discrete and continuous 
variables, acceptance inspection, single, double and sequential sampling, OC 
curves and the AOQL concept, among other matters. The list of topics dis- 
cussed is probably as impressive as the discussion. 

The book is interesting, though after a while non-analytic discussions of 
statistical techniques seem unrewarding. But I am sure that the book will 
serve the useful purpose of introducing quality control to those who know 
nothing about it. For those who know neither quality control nor French, it 
offers an interesting opportunity to learn both. 

The proofreading on small details is casual. Counting as faults those errors 
and omissions which most American publishers would so count, I reached a 
total of thirty-four on the one page of bibliography before quitting. 


Experimental Design in Psychological Research. Alizn L. Edwards (Professor of 
Psychology, University of Washington). New York: Rinehart and Company, 
Inc., 1950. Pp. xiv, 446. 


WituiaM Kruskat, University of Chicago 


HE purpose of this text is to present to students of psychology and other 
behavioral sciences the elements of modern statistical method. The pres- 
entation is largely within a psychological framework in which the mathe- 
matical level is kept at or below that of high-school algebra. Although the 
book emphasizes analysis of variance procedures, it contains chapters on 
the elements of probability, chi-square-like methods, and the correlation 
coefficient; so that it may be described as a psychologically oriented version 
of material covered in such standard texts as Snedecor’s Statistical Methods. 
Considering the limited mathematical knowledge that the author pre- 
supposes, the exposition is fairly clear, and particularly so in regard to such 
pedagogically troublesome points as correction for continuity, the nature of 
interactions, and the correspondence between systems of tests and confidence 
sets. One major exception is the treatment of components of variance analy- 
sis, versions of which—actually mixed models—are discussed at length. I 
predict heavy going and confusion for the psychologist-reader of these chap- 
ters, mainly 14 and 15. On the other hand, the mixed models described here 
may be new to some psychologists, and the detailed description of various 
examples should serve to popularize them. 
This text is self-contained, but younger students of psychology may find 
it difficult to understand unless they have read a more elementary introduc- 
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tion to the subject. Presumably the text is intended to follow Professor 
Edwards’ earlier book, Statistical Analysis for Students in Psychology and 
Education. At any rate the two texts overlap considerably, but differ in that 
the present text extends the discussion to a much greater variety of experi- 
mental design, whereas the earlier text discusses at length more elementary 
topics, such as those of standard descriptive statistics. 

The typography and other physical characteristics of the volume are ex- 
cellent. There is a good index, fairly convenient tables, a bibliography 
(mainly of statistically interesting articles in psychological journals), and 
many illuminating yet simple examples together with answers. 

I have a number of criticisms of this textbook. These should be prefaced 
by the remark that many or all of its competitors are subject to many or all 
of the same criticisms. 

First, a great deal of important material is completely omitted. For ex- 
ample, nothing at all is said about the analysis of variance power function. 
Granted that a text of this sort cannot go into theoretical details, still some- 
thing can be said about power, and to omit it leaves the reader with the im- 
pression that only errors of the first kind are important. (Although brief 
mention of Type II error is made in Chapter 2, it does not appear again and 
discussions of relative efficiency, such as that presented on pages 270-273, 
suggest strongly to the reader that power ix not important.) Some other 
topics which have been completely omitted are: the treatment of “missing 
values” in standard designs, the more common “non-orthogonal” designs, 
non-parametric methods, polynomial regression, rules of thumb for practical 
computation, and multivariate analysis. Some of these topics may have been 
omitted because of the unfortunately restricted technical meaning that the 
term “Design of Experiments” has come to have. However, the reader of 
this book is promised by its title and tone a general survey of statistical 
methods applicable to psychological experimentation, and this promise is 
not nearly fulfilled. 

Second, to a number of important problems of applied statistics this text 
gives dogmatic answers without a hint that such answers are subject to 
grave criticisms, and that, in fact, generally accepted answers do not exist at 
present. For example, preliminary tests of significance are used with aban- 
don, and without the comment that their use affects the power of the over-all 
procedure. A particularly flagrant case of this is presented on page 255, based 
on the statement that if the null hypothesis that interactions of a given order 
are zero is not rejected, then “this fact provides good evidence that none of 
the higher-order interactions will be significant. . . .” (This statement may or 
may not be true as an empirical generalization, but it surely does not follow 
from the general theory.) As another example, no mention is made of the 
risks inherent in the performance of many tests on the same body of experi- 
mental data. For example, the author’s attitude is that the over-all sum of 
squares in a factorial experiment should be broken down fully in the tradi- 





1 New York, Rinehart and Company, Inc., 1946. Reviewed in this Jounnat by E. L. Grant (v. 41, 
pp. 397-398) and J. H. Curtiss (v. 42, pp. 315-318). 





412 AMERICAN STATISTICAL ASSOCIATION JOURNAL, SEPTEMBER 195) 


tional sense, and that all possible tests within the breakdown should be 
made as a matter of course. 

Third, the mathematical models of the statistical procedures described are 
nowhere clearly stated, even in words. Thus additivity and interaction be- 
come difficult to explain. The distinction between standard analysis of vari- 
ance and components of variance models becomes blurred. And the manipula- 
tions described, especially in the case of analysis of covariance, become non- 
motivated, follow-the-leader methods. I doubt that most students of this 
book would be able to apply its methods to cases differing from those given 
in detail in the text. 

A full list of other comments would be too lengthy for this review, but the 
following should be mentioned: — 

1. There is persistent notational confusion between population parameters 
and their sample analogues. 

2. On p. 27 there appears the statement: “All experiments involve the 
testing of some hypothesis, and this hypothesis is often referred to as the 
null hypothesis.” 

3. Actual acceptance or rejection of the null hypothesis are treated as the 
only possible values in the two-valued decision situation, rather than as 
conventional token names for such values. 

4. On pp. 78-79 there appears this statement: “Experiments involving 
two or more samples and the hypothesis that the samples are from a common 
population are logically evaluated by a two-tailed test of significance.” 

5. A number of well-known and useful tools are not mentioned. In the 
discussion of Fisher’s exact test for the 2X2 table there is no reference to the 
Mainland or Finney tables which make this test really practical. In the dis- 
cussion of the binomial parameter there is no reference to the Clopper- 
Pearson graphs. In the discussion of the correlation coefficient, there is no 
reference to David’s tables. 

6. Non-normality is apparently measured by skewness alone. The discus- 
sion of non-normality is based largely on a small unpublished sampling ex- 
periment with only indirect reference to the many large-scale sampling ex- 
periments described in the literature. 

7. Nothing is said about tests or confidence intervals for population vari- 
ance in the normal case. 

To summarize: this text should be useful as supplementary reading for 
those students of the behavioral sciences who more readily learn statistical 
methods when they are presented in the specific context of psychological re- 
search. For mature psychologists who are already somewhat familiar with 
statistical methods the chapters on mixed models in the analysis of variance 
may prove rewarding as a description of the applications of that technique. 
However, the reader of this text should be warned that it describes only a 
portion of statistical methodology, and that many of its statements are dog- 
matic and by no means generally accepted. 
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Cing Années de Sondages. Institut Universitaire d’ Information sociale et écono- 
mique (Centre belge pour |’étude de l’opinion et des marchés). Pare Léopold, 
Brussels, Belgium, 1950. Pp. 74. 


nis publication is the last (sixth) 1950 issue of a bi-monthly bulletin 
gown by the Institut Universitaire. It summarizes the results of 
sampling surveys on economic, political, and social questions made in Bel- 
gium by the Institut from 1946 to the end of 1950. The questions actually 
asked are given along with replies tabulated on an over-all basis (i.e., no 
cross-tabulations). There is no description of sampling methods or other 
phases of the actual conduct of the studies. However, greater detail is ap- 
parently given in earlier bulletins, and this issue lists the salient contents of 
all earlier bulletins from 1946 to date. Moreover, for all questions here sum- 
marized, a reference is given to the original report. 

Judging from this bulletin, the Institut’s questioning methods closely 
parallel those commonly used by American opinion and market research 
organizations. The Institut has studied a wide range of subjects of great 
interest and importance. The following sample (page 62) is taken from a 
study originally reported in issue No. 6, 1948, pp. 26-27 (wording trans- 


lated from the original French) 

tn im 
opinion 
Do you believe that the American 

government sincerely wants peace? 68.5% 13.9% 17.6% 
Do you believe that the government of 


the U.S.S.R. sincerely wants peace? 18.0% 60.2% 21.8% 
H. V. R. 
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